NVIDIA

Very Weak 0.8/5

Click categories for more information

very weak

weak

moderate

substantial

strong

Risk Identification

Learn more

Risk Identification

14%

Risk Analysis and Evaluation

Learn more

Risk Analysis and Evaluation

11%

Risk Treatment

Learn more

Risk Treatment

14%

Risk Governance

Learn more

Risk Governance

23%

Up to date as of October 2025

Best in class

SEE FRAMEWORK

Nvidia stands out among its peers for its strong risk culture and explicit mentions of how risk awareness is embedded into daily work. This includes training, open dialogue on ethical considerations, interviews with engineering teams, and consistent communication channels with employees.

Overview

Highlights relative to others

Stronger risk governance, evidenced by central risk function, risk ownership, and functions to challenge management on risk decisions.
Robust risk modeling methodology, with clear prioritization of risk domains.
Containment measures are detailed.

Weaknesses relative to others

No references to internal or external audits.
Lacking use of third-party expertise, such as for risk modeling, validation of mitigations, or replication of evaluations.
Lacking risk modeling.
Lacking justification or description of elicitation methods.

NVIDIA

1. Risk Identification

Very Weak 14%

1.1 Classification of Applicable Known Risks (40%) 18%

1.1.1 Risks from literature and taxonomies are well covered (50%) 25%

Risks covered include cyber offence, CBRN risks, persuasion and at-scale discrimination.

They comprehensively reference literature for risk identification: references include the UK Government Office for Science, OpenAI, Centre for Security and Emerging Technologies, as well as AI Vulnerability database, AI Incident database, AAAIC database, and the OECD.ai AI Incidents Monitor.

Importantly however, risks covered do not include cover loss of control or autonomous AI R&D risks, and 1.1.2 is less than 50%

Quotes:

“NVIDIA has a comprehensive repository of potential hazards that has been carefully curated and mapped to assets to help guide teams to understand potential risks related with their products. This repository has been created using a variety of sources e.g. stakeholder consultation, market data, incident reports (AI Vulnerability database, AI Incident database, AAAIC database, OECD.ai AI Incidents Monitor). This approach is suitable when we have a well-defined set of capabilities and a known use case for a specific model.

[…] A list of potential systemic risks associated with frontier AI models were identified using the risk analysis we designed and confirmed by reviewing existing literature and academic research. In particular, frontier models may have the capacity to present the following hazards.

Cyber offence e.g. risks from using AI for discovering or exploiting system vulnerabilities.
Chemical, biological, radiological, and nuclear risks e.g. AI enabling the development and use of weapons of mass destruction.
Persuasion and manipulation e.g. influence operations, disinformation, and erosion of democratic values through AI-driven content.
At-scale discrimination e.g. bias and unlawful discrimination enabled by AI systems.” (pp. 7-8)

1.1.2 Exclusions are clearly justified and documented (50%) 10%

The framework describes the need to consider “speculative risks”, not just a well-defined set of capabilities, given use cases of frontier models may be ambiguous. Yet, they do not provide any justification for excluding loss of control risks and automated AI R&D risks in their risk identification, despite these risks being mentioned in the sources cited. To improve, they should either monitor these risks, or provide stronger justification for their exclusion that refers to at least one of: academic literature/scientific consensus; internal threat modelling with transparency; third-party validation, with named expert groups and reasons for their validation.

Quotes:

“However, for frontier models we need to consider speculative risks that may or may not be present in the model. To help detect specific adversarial capabilities, models will be stress-tested against extreme but plausible scenarios that may lead to systemic risks. This approach ensures that both known and emergent hazards are taken into account.

A list of potential systemic risks associated with frontier AI models were identified using the risk analysis we designed and confirmed by reviewing existing literature and academic research. In particular, frontier models may have the capacity to present the following hazards.

Cyber offence e.g. risks from using AI for discovering or exploiting system vulnerabilities.
Chemical, biological, radiological, and nuclear risks e.g. AI enabling the development and use of weapons of mass destruction.
Persuasion and manipulation e.g. influence operations, disinformation, and erosion of democratic values through AI-driven content.
At-scale discrimination e.g. bias and unlawful discrimination enabled by AI systems.” (p. 8)

1.2 Identification of Unknown Risks (Open-ended red teaming) (20%) 7%

1.2.1 Internal open-ended red teaming (70%) 10%

The framework describes adversarial red teaming that tests for “speculative risks that may or may not be present” within predetermined categories (“harmful, biased, or disallowed outputs”), not genuine open-ended exploration. This represents structured vulnerability testing of pre-defined risk models, rather than red-teamer led discovery of novel risk domains or risk models. However, the framework acknowledges human “domain knowledge, creativity, and context-awareness” can identify “emerging risks that cannot be directly measured through benchmarking,” showing awareness that pre-defined testing has limitations, and that expert interaction with the model can identify specifically novel risk models.

The red teaming described requires “expert human operators”. To improve, more detail could be added on (a) the level of expertise required, and (b) justification for why the internal team satisfies this level.

To improve, the framework should commit to a process to identify either novel risk domains, or novel risk models/changed risk profiles within pre-specified risk domains (e.g. emergence of an extended context length allowing improved zero shot learning changes the risk profile), and provide methodology, resources and required expertise.

Quotes:

“Human adversaries are able to leverage domain knowledge, creativity, and context-awareness to simulate realistic attack strategies.” (p. 13)

“Certain risks may also be hard to capture in a single, standardized framework. The benchmark might miss emergent, scenario-specific failure modes. Red teaming activities are used in conjunction with public benchmarks to address those limitations and capture those emerging risks that cannot be directly measured through benchmarking. In adversarial red teaming, expert human operators deliberately probe a frontier AI model’s vulnerability and attempt to induce it to produce harmful, biased, or disallowed outputs.” (p. 13)

“NVIDIA has a comprehensive repository of potential hazards that has been carefully curated and mapped to assets to help guide teams to understand potential risks related with their products. […] This approach is suitable when we have a well-defined set of capabilities and a known use case for a specific model. However, for frontier models we need to consider speculative risks that may or may not be present in the model. To help detect specific adversarial capabilities, models will be stress-tested against extreme but plausible scenarios that may lead to systemic risks. This approach ensures that both known and emergent hazards are taken into account.” (p. 7)

1.2.2 Third party open-ended red teaming (30%) 0%

There is a platform for vulnerability scanning, Garak, described. However, whilst this utilises community help to catalog more instances of prompt injection, jailbreaking, and other known vulnerability types, satisfying this criterion requires qualified experts to discover risk categories or models that weren’t previously considered/are emergent.

To improve, they should commit to an external process to identify either novel risk domains, or novel risk models/changed risk profiles within pre-specified risk domains (e.g. emergence of an extended context length allowing improved zero shot learning changes the risk profile), and provide methodology, resources and required expertise.

Quotes:

“To help focus red teaming activities and respond to model vulnerabilities and weaknesses, we first need to be aware of them. In cybersecurity, vulnerability scanners serve the purpose of proactively checking tools and deployments for known and potential weaknesses. For generative AI, we need an analogue. NVIDIA runs and supports the Garak LLM vulnerability scanner. This constantly updated public project collects techniques for exploiting LLM and multi-modal model vulnerabilities and provides a testing and reporting environment for evaluating models’ susceptibility. The project has formed a hub with a thriving community of volunteers that add their upgrades and knowledge. Garak can test numerous scenarios rapidly, far exceeding the coverage possible with manual methods. Systematic exploration of model weaknesses can be repeated frequently, ensuring continuous oversight as the model evolves. NVIDIA takes advantage of this and uses Garak as a highest-priority assessment of models before release.” (p. 13)

1.3 Risk modeling (40%) 14%

1.3.1 The company uses risk models for all the risk domains identified and the risk models are published (with potentially dangerous information redacted) (40%) 25%

In the “Risk Identification and Analysis” section, the framework sets out first their risk analysis methodology, then the hazards (i.e., risk domains) they focus on, and states that for each harm in a given risk domain, the pre-mitigation risk level will be determined by estimating the likelihood, severity and observability of the harm.

They define their risk assessment methodology with their scoring table (Table 1). Because this involves scoring things like duration, detectability, frequency etc., this likely involves modeling how threats may be realized. However, it is not clear how they arrive at scores for each component of this risk analysis (e.g. duration, detectability, frequency etc.) To improve, scores should be informed by risk modelling that includes causal pathways to harm with discrete, measurable steps, and the methodology for this risk modelling should be precisely defined.

Whilst this is notable as it means there is a structured methodology for arriving at risk determinations, this is too high level to count as risk modelling. To improve, the company should break down step by step causal pathways of harm with distinct threat scenarios in order to inform the likelihood/severity/observability scores. In addition, these risk models and threat scenarios should then be published.

However, they do give differential ‘model risk’ scores, depending on the model’s use case, expected level of capability, and autonomy. This pre-emptive assessment of potential manifestations of harm shows some awareness of risk modeling, which is rewarded here.

Quotes:

“Each risk criteria have discrete thresholds between 1 and 5 that are used to determine a model’s risk category. The [Preliminary Risk Assessment] will assign a model risk (MR) score between 1 and 5 based on the highest MR score within this criteria. Below is a nonexhaustive list of attributes used to define the MR score.The MR score is correlated to the maximum permissible harm relative to our trustworthy AI principles. High risk models require more intensive scrutiny, increased oversight and face stricter development and operational constraints.” (p. 2)

“NVIDIA’s Trustworthy AI Principles are derived from human rights and legal principles. These principles are used as a foundation for defining a broad range of potential risks that a product may be exposed to. Based on the description of a product’s architecture and development workflows it should be possible to identify possible hazards, estimate the level of risk for each hazard and categorize the cumulative risk relative to our trustworthy AI principles.

We defined risk as the potential for an event to lead to an undesired outcome, measured in terms of its likelihood (probability), its impact (severity) and its ability to be controlled or detected (controllability). The risk associated with each hazard is scored between 1 and 64, with the higher value indicating a higher risk.

Risk = likelihood x severity x observability
Risk = frequency x (duration + speed of onset) x (detectability + predictability)

A hazard that has a non-zero but very low probability of occurring, that is transient in nature, occurs gradually, easy to detect and localized has the lowest risk score. In contrast, a hazard that has a high probability of occurring, is permanent in nature, occurs instantaneously and randomly due to latent faults has the highest risk score.” (p. 6)

1.3.2 Risk modeling methodology (40%) 9%

1.3.2.1 Methodology precisely defined (70%) 0%

They define their risk assessment methodology with their scoring table (Table 1). However, it is not clear how they arrive at scores for each component of this risk analysis (e.g. duration, detectability, frequency etc.) Indeed, no risk modeling methodology is defined for actually mapping out how harms may be realized.

Quotes:

“We defined risk as the potential for an event to lead to an undesired outcome, measured in terms of its likelihood (probability), its impact (severity) and its ability to be controlled or detected (controllability). The risk associated with each hazard is scored between 1 and 64, with the higher value indicating a higher risk.

Risk = likelihood x severity x observability
Risk = frequency x (duration + speed of onset) x (detectability + predictability)

See Table 1 in the Framework, on page 7.

1.3.2.2 Mechanism to incorporate red teaming findings (15%) 10%

Whilst there is mention of incorporating hazards identified during red-teaming, showing awareness that red-teaming may uncover new risks to consider and thus analyse, this only includes risks that were prespecified but previously absent. To improve, open ended red-teaming should be conducted, and when novel risks or risk pathways are discovered, this should trigger new risk modelling of other affected risk domains.

Quotes:

“For frontier models we need to consider speculative risks that may or may not be present in the model. To help detect specific adversarial capabilities, models will be stress-tested against extreme but plausible scenarios that may lead to systemic risks. This approach ensures that both known and emergent hazards are taken into account.” (p. 7)

1.3.2.3 Prioritization of severe and probable risks (15%) 50%

There is a clear prioritization of risk domains (‘hazards’, in NVIDIA’s terms) by severity, likelihood, as well as controllability. These are taken across the full space of risk models.

To improve, probability and severity scores (qualitative or quantiative) should be published for different risk models, with justification given for these scores.

It is commendable that they further broke down severity, and added observability, showing nuance.

Quotes:

Risk = likelihood x severity x observability

Risk = frequency x (duration + speed of onset) x (detectability + predictability)
A hazard that has a non-zero but very low probability of occurring, that is transient in nature, occurs gradually, easy to detect and localized has the lowest risk score. In contrast, a hazard that has a high probability of occurring, is permanent in nature, occurs instantaneously and randomly due to latent faults has the highest risk score.” (p. 6)

“Based on the description of a product’s architecture and development workflows it should be possible to identify possible hazards, estimate the level of risk for each hazard and categorize the cumulative risk relative to our trustworthy AI principles.” (p. 6)

1.3.3 Third party validation of risk models (20%) 0%

There is no mention of third parties validating risk models.

Quotes:

No relevant quotes found.

NVIDIA

2. Risk Analysis and Evaluation

Very Weak 11%

2.1 Setting a Risk Tolerance (35%) 3%

2.1.1 Risk tolerance is defined (80%) 3%

2.1.1.1 Risk tolerance is at least qualitatively defined for all risks (33%) 10%

There is no indication of a risk tolerance. However, since they give pre and post mitigation risk scores, it would be very easy to implement a risk tolerance by stating the number post-mitigation risk scores must stay below. However, this risk tolerance would need to be well justified, as it is somewhat abstract. To improve, the risk tolerance should be expressed via concrete scenario in quantitative terms, eg X% chance of Y amount of (e.g. deaths, economic damage).

However, they do give differential ‘model risk’ scores, depending on the model’s use case, expected level of capability, and autonomy. They say that the “maximum permissible harm” is correlated to these model risk scores, suggesting that the risk tolerance for e.g. a retail deployment versus healthcare appears to be different. To improve, an actual risk tolerance should be explicitly stated.

Quotes:

2.1.1.2 Risk tolerance is expressed at least partly quantitatively as a combination of scenarios (qualitative) and probabilities (quantitative) for all risks (33%) 0%

There is no indication of a risk tolerance, explicit or implicit.

Quotes:

No relevant quotes found.

2.1.1.3 Risk tolerance is expressed fully quantitatively as a product of severity (quantitative) and probability (quantitative) for all risks (33%) 0%

There is no indication of a risk tolerance. However, since they explicitly give risk scores, it is easy to implement a risk tolerance by stating what risk score is the threshold. Ideally though, the risk tolerance should be expressed via concrete scenario in quantitative terms, eg X% chance of Y amount of (e.g. deaths, economic damage).

Quotes:

No relevant quotes found.

2.1.2 Process to define the tolerance (20%) 0%

2.1.2.1 AI developers engage in public consultations or seek guidance from regulators where available (50%) 0%

No evidence of engaging in public consultations or seeking guidance from regulators for risk tolerance.

Quotes:

No relevant quotes found.

2.1.2.2 Any significant deviations from risk tolerance norms established in other industries is justified and documented (e.g., cost-benefit analyses) (50%) 0%

No justification process: No evidence of considering whether their approach aligns with or deviates from established norms.

Quotes:

No relevant quotes found.

2.2 Operationalizing Risk Tolerance (65%) 16%

2.2.1 Key Risk Indicators (KRI) (30%) 15%

2.2.1.1 KRI thresholds are at least qualitatively defined for all risks (45%) 10%

Whilst they describe benchmarks that could be used as key risk indicators, they do not actually define thresholds, nor qualitative scenarios that are proxies for their risk tolerance for each domain. The benchmarks are not clearly grounded in risk modelling. However, they do show awareness that “identifying early warning signs” is an important aspect of risk management. They also state that certain benchmarks “need to be repurposed or combined to create robust testing environments”, but do not describe how – to improve, the actual KRIs used should be given in detail, and the thresholds which indicate the risk level warrants mitigations.

Quotes:

“Identifying early warning signs for these potential hazardous capabilities are crucial to mitigating systemic risk in frontier AI models. Common public benchmarks are excellent tools for providing broad coverage over curated data samples and easing comparison between published models. Public benchmarks are currently available to test for capabilities associated with manipulation or large-scale discrimination, with the current generation including e.g.

TruthfulQA, FEVER, and GLUE test a model’s tendency to generate false or misleading content.
BBQ and BOLD test open-ended generation for biased language.
WMDP benchmark serves as both a proxy evaluation for hazardous knowledge in large language models (LLMs) and a benchmark for unlearning methods to remove such knowledge.

Whilst many public benchmarks exist, not many are directly targeted to measure frontier risks. In such cases, existing benchmarks may need to be repurposed or combined to create robust testing environments.

MBPP42 measures code synthesis ability but would need adaptation to test for malicious code patterns.
MoleculeNet43 could be repurposed to determine whether the model can generate toxic compounds.
ARC44 can be adapted to detect if a [sic] model’s presents capabilities beyond those it is intended or trained to have

AILuminate v1.0 from MLCommons is one of the few benchmarks that is intended to evaluate frontier AI models across various dimensions of trustworthiness and risk. AILuminate broadens the scope to assess attributes such as robustness, fairness, explainability, compliance with ethical guidelines, and resilience to adversarial inputs. It aims to provide a more holistic view of a model’s behavior and potential impacts in real-world scenarios.” (p. 12)

2.2.1.2 KRI thresholds are quantitatively defined for all risks (45%) 0%

KRIs are not quantitatively defined for any risks.

Quotes:

TruthfulQA, FEVER, and GLUE test a model’s tendency to generate false or misleading content.
BBQ and BOLD test open-ended generation for biased language.
WMDP benchmark serves as both a proxy evaluation for hazardous knowledge in large language models (LLMs) and a benchmark for unlearning methods to remove such knowledge.

MBPP42 measures code synthesis ability but would need adaptation to test for malicious code patterns.
MoleculeNet43 could be repurposed to determine whether the model can generate toxic compounds.
ARC44 can be adapted to detect if a [sic] model’s presents capabilities beyond those it is intended or trained to have

2.2.1.3 KRIs also identify and monitor changes in the level of risk in the external environment (10%) 10%

There is an indication that there is monitoring of levels of risk in the external environment with pre-defined thresholds triggering new risk assessments: “Risk assessments are periodically reviewed, and repeated if pre-defined thresholds are met e.g. technology matures, component is significantly modified, operating conditions change, or a hazard occurs with high severity or frequency.” To improve, the company could define what this KRI is and what the thresholds actually entail, as well as linking these to risk models. Further, they could link such external KRIs directly to mitigations, rather than just a repeated risk assessment.

Quotes:

2.2.2 Key Control Indicators (KCI) (30%) 6%

2.2.2.1 Containment KCIs (35%) 13%

2.2.2.1.1 All KRI thresholds have corresponding qualitative containment KCI thresholds (50%) 25%

There is some awareness of the if-then relationship between KRI and KCIs – for instance, “When a model shows capabilities of frontier AI models pre deployment we will initially restrict access to model weights to essential personnel and ensure rigorous security protocols are in place”, but this is only one example and the KCI would need to demonstrate what ‘rigorous’ is defined as. While they mention various containment measures, there’s no systematic mapping of which containment level is required for which capability threshold.

They report a “Risk Analysis” risk score out of 64 (pre-mitigation estimated risk), and then a “Residual Risk” risk score out of 64 (post-mitigation estimated risk). KCIs could easily be implemented here, such as thresholds the residual risk must remain below for each risk domain.

Further, they list various mitigation strategies on pages 9-11, under headings “Decreasing the frequency of a hazard”, “Hazard detection”, “Increasing predictability of hazards”, “Lowering hazard duration” and “Decreasing hazard onset speed”. Some of these could be implemented via containment KCIs, e.g. the frequency of successful cyberattacks that […] should be decreased to (X amount). This would allow mitigation success to be measurable, allowing transparency and assurance that mitigations sufficiently reduce risk.

Quotes:

“Recognizing that risk cannot be entirely eliminated, the effectiveness of each control is evaluated according to its impact on the attributes used to calculate the initial risk e.g. prompt-based guardrails that reduce the frequency of adversarial prompts being inputted into a model. Table 2 provides an example of how a risk analysis may be documented for models that have the capabilities to spread disinformation.” (p. 8)

They list various mitigation strategies on pages 9-11, under headings “Decreasing the frequency of a hazard”, “Hazard detection”, “Increasing predictability of hazards”, “Lowering hazard duration” and “Decreasing hazard onset speed”.

“The [Detailed Risk Assessment] then examines the product’s architecture and development processes in detail, identifies use case specific hazards, assigns more granular risk scores based on those hazards, and recommends methods for risk mitigation. Our risk evaluation process then estimates the residual risk after controls are applied and compares it against the potential initial risks posed by the AI-based product. Leveraging the results from the risk evaluation phase, it is possible to determine how residual risks correspond with NVIDIA’s Trustworthy AI (TAI) principles and document any trade-offs made during the allocation of risk treatment measures.” (p. 1)

2.2.2.1.2 All KRI thresholds have corresponding quantitative containment KCI thresholds (50%) 0%

No quantitative containment KCI thresholds are given.

Quotes:

No relevant quotes found.

2.2.2.2 Deployment KCIs (35%) 5%

2.2.2.2.1 All KRI thresholds have corresponding qualitative deployment KCI thresholds (50%) 10%

No deployment KCI thresholds, qualitative or quantitative, are given for KRIs (i.e., no if-then relationships are given such that “if X risk threshold is crossed, then Y deployment mitigation threshold must be met.”) While they mention various deployment measures, they don’t specify measurable thresholds these measures must meet (e.g., “jailbreak success rate must be <1%” or “toxic output rate must be <0.1%”), qualitatively or quantitatively.

They do list various mitigation strategies on pages 9-11, under headings “Decreasing the frequency of a hazard”, “Hazard detection”, “Increasing predictability of hazards”, “Lowering hazard duration” and “Decreasing hazard onset speed”. These could be implemented via deployment KCIs, e.g. the frequency of a jailbreak should be decreased to (X amount), which should be linked to specific KRI threshold events. This would allow mitigation success to be measurable, giving transparency and assurance that mitigations sufficiently reduce risk.

The mention of mitigation strategies in terms of “increasing” or “decreasing” aspects shows some awareness that mitigation strategies need to increase/decrease some risk vectors by some amount. This garners partial credit for this criterion.

Quotes:

No relevant quotes found.

2.2.2.2.2 All KRI thresholds have corresponding quantitative deployment KCI thresholds (50%) 0%

There are no quantitative deployment KCI thresholds given.

Quotes:

No relevant quotes found.

2.2.2.3 For advanced KRIs, assurance process KCIs are defined (30%) 0%

No assurance processes KCIs are defined.

Quotes:

No relevant quotes found.

2.2.3 Pairs of thresholds are grounded in risk modeling to show that risks remain below the tolerance (20%) 25%

There is some basic risk assessment methodology present – they define risk as “likelihood x severity x observability” and provide a detailed scoring matrix with a 1-64 scale, but this falls well short of the rigorous risk modeling required to justify KRI-KCI threshold pairs. While they provide one illustrative example showing risk dropping from 49 to 5 after mitigations, there’s no justification for why a residual risk of 5 is acceptable, what confidence level this assessment has, or why the risk drops precisely from 49 to 5/how they reached those numbers.

They mention “estimating the level of risk for each hazard” but do not provide evidence of the detailed scenario-based modeling that would demonstrate their threshold pairs actually keep risk below tolerance.

They have the building blocks with their risk scoring methodology, i.e. reasoning pre-development about why KCI measures will decrease the risk sufficiently, but lack the systematic justification showing that this KCI threshold is sufficient, using risk modelling.

Quotes:

Risk = likelihood x severity x observability
Risk = frequency x (duration + speed of onset) x (detectability + predictability)

“Hazard: Disinformation. Risk Analysis: 49 [out of 64]. […] Control: Block toxic prompts; Impacted asset: input data; Risk Impact: Reduce likelihood. Residual Risk: 5 [out of 64].” (Table 2, p. 8)

2.2.4 Policy to put development on hold if the required KCI threshold cannot be achieved, until sufficient controls are implemented to meet the threshold (20%) 25%

There is a commitment to “pause development when necessary” and to “as a last resort” remove the model from the market. However, more detail is required on what precisely triggers pausing development/dedeployment, such that the conditions for this action are pre-emptively decided. The process for pausing development and deployment should also be given to ensure that risk levels do not exceed the risk tolerance at any point.

Quotes:

“Additionally, reducing access to a model reactively when misuse is detected can help limit further harm. This can involve rolling back a model to a previous version or discontinuing its availability if significant misuse risks emerge during production.” (p. 11)

“Decreasing the speed of onset for a hazard is essential in managing risks associated with frontier AI models. […] As a last resort, full market removal or deletion of the model and its components can be considered to prevent further misuse and contain hazards effectively.” (p. 12)

“Key to [our governance] approach is early detection of potential risks, coupled with mechanisms to pause development when necessary.” (p. 14)

NVIDIA

3. Risk Treatment

Very Weak 14%

3.1 Implementing Mitigation Measures (50%) 19%

3.1.1 Containment measures (35%) 30%

3.1.1.1 Containment measures are precisely defined for all KCI thresholds (60%) 50%

NVIDIA does provide some specific containment measures, even if their KCI thresholds are weak. For their general approach of restricting access when models show frontier capabilities, they do attempt to define specific measures like “extreme isolation of weight storage, strict application allow-listing, and advanced insider threat programs.” They also specify access control mechanisms including “secure API keys and authentication protocols” and “Know-Your-Customer (KYC) screenings for users with high output needs, and limiting access frequency by capping requests or instituting time-based quotas.” However, these measures lack precision – for instance, “extreme isolation” doesn’t specify technical requirements, and “advanced insider threat programs” doesn’t detail what constitutes “advanced.” The measures are more like categories of containment approaches rather than precisely defined implementations.

Quotes:

“Access control measures further mitigate risks. These include ensuring only authorized users access the model through secure API keys and authentication protocols, performing Know-Your-Customer (KYC) screenings for users with high output needs, and limiting access frequency by capping requests or instituting time-based quotas.” (p. 12)

“When a model shows capabilities of frontier AI models pre deployment we will initially restrict access to model weights to essential personnel and ensure rigorous security protocols are in place. Measures will also be in place to restrict at-will fine tuning of frontier AI models without safeguards in NeMo customizer, reducing the options to retrain a model on data related to dangerous tasks or to reduce how often the model refuses potentially dangerous requests.” (p. 12)

3.1.1.2 Proof that containment measures are sufficient to meet the thresholds (40%) 0%

While they describe various containment approaches like “extreme isolation of weight storage” and access controls, little evidence is provided that these measures will be sufficient for meeting the containment KCI thresholds (i.e., no process exists to solicit such proof before training nor during training.) To improve, the framework should describe an internal process for verifying that containment measures will be sufficient for the relevant containment KCI threshold, and show the findings of this process in advance.

Quotes:

No relevant quotes found.

3.1.1.3 Strong third party verification process to verify that the containment measures meet the threshold (100% if 3.1.1.3 > [60% x 3.1.1.1 + 40% x 3.1.1.2]) 0%

While they describe various containment approaches like “extreme isolation of weight storage” and access controls, there’s no indication of reasoning that these measures will be sufficient to meet the KCI thresholds (i.e., no process before training nor during training), internally or externally. To improve, the framework should describe an external process for verifying and providing evidence/argumentation that containment measures will be sufficient for the relevant containment KCI threshold.

Quotes:

No relevant quotes found.

3.1.2 Deployment measures (35%) 19%

3.1.2.1 Deployment measures are precisely defined for all KCI thresholds (60%) 25%

NVIDIA does provide some specific deployment measures with reasonable precision. They define “NeMo Guardrails” with specific components: “Input rails are guardrails applied to the input from the user; an input rail can reject the input, stopping any additional processing, or alter the input” and similarly for output rails, dialog rails, retrieval rails, and execution rails. They specify particular tools like “Jailbreak detection techniques through Ardennes,” “Output checking through Presidio or ActiveFence,” “Fact checking through AlignScore,” and “Hallucination detection through Patronus Lynx or CleanLab.” They also define specific policies like “Know-Your-Customer (KYC) screenings for users with high output needs” and “limiting access frequency by capping requests or instituting time-based quotas.” However, the measures aren’t mapped to specific KCIs. Further, the measures outline could use more detail, to improve the scoring.

Quotes:

“NeMo Guardrails library currently includes:

Jailbreak detection techniques through Ardennes
Output checking through Presidio or ActiveFence
Fact checking through AlignScore
Hallucination detection through Patronus Lynx or CleanLab
Content safety through LlamaGuard or Aegis content safety” (p. 9)

“Input rails are guardrails applied to the input from the user; an input rail can reject the input, stopping any additional processing, or alter the input (e.g., to mask potentially sensitive data, to rephrase). Cosmos pre-Guard leverages Aegis-AI-Content-Safety-LlamaGuard-LLM-Defensive-1.0, which is a fine-tuned version of Llama-Guard trained on NVIDIA’s Aegis Content Safety Dataset and a blocklist filter that performs a lemmatized and whole-word keyword search to block harmful prompts. It then further sanitizes the user prompt by processing it through the Cosmos Text2World Prompt Upsampler.
Dialog rails influence how the LLM is prompted; dialog rails operate on canonical form messages and determine if an action should be executed, if the LLM should be invoked to generate the next step or a response, if a predefined response should be used instead, etc.
Retrieval rails are guardrails applied to the retrieved chunks in the case of a RAG (Retrieval Augmented Generation) scenario; a retrieval rail can reject a chunk, preventing it from being used to prompt the LLM, or alter the relevant chunks (e.g., to mask potentially sensitive data).
Output rails are guardrails applied to the output generated by the LLM; an output rail can reject the output, preventing it from being returned to the user, or alter it (e.g., removing sensitive data). Cosmos post-Guard stage blocks harmful visual outputs using a video content safety classifier and a face blur filter.
Execution rails are guardrails applied to input/output of the custom actions (a.k.a. tools), that need to be called by the LLM.” (p. 9)

“Decreasing the speed of onset for a hazard is essential in managing risks associated with frontier AI models. Key strategies include maintaining human oversight by avoiding full autonomy in critical systems and ensuring a human-in-the-loop for all decisions in high-stakes contexts. This slows down potentially harmful automated actions, allowing for intervention.” (pp. 11-12)

“Proactive monitoring is equally critical. This includes detecting and blocking misuse attempts using algorithmic classifiers, which can limit unsafe queries, modify responses, or block users attempting to bypass safeguards.” (p. 12)

3.1.2.2 Proof that deployment measures are sufficient to meet the thresholds (40%) 10%

There is some minimal justification that deployment measures are sufficient (for some implicit KCI threshold): “Deploying safeguards across various points in a model’s architecture ensures that if one layer is compromised, others remain effective.” However, there is no specific justification given that the deployment measures specified will be sufficient to meet the KCI thresholds. For instance, reasoning should be given for why their defense in depth strategy will work sufficiently and not require more deployment measures.

To improve, the framework should describe an internal process to find evidence/argumentation that deployment measures will be sufficient for the relevant deployment KCI threshold, and publish this justification.

Quotes:

“Deploying safeguards across various points in a model’s architecture ensures that if one layer is compromised, others remain effective. This approach enhances resilience against potential risks by providing redundant protective measures.” (p. 9)

3.1.2.3 Strong third party verification process to verify that the deployment measures meet the threshold (100% if 3.1.2.3 > [60% x 3.1.2.1 + 40% x 3.1.2.2]) 0%

There is no third-party verification process nor verification that the deployment measures meet the relevant deployment KCI threshold.

Quotes:

No relevant quotes found.

3.1.3 Assurance processes (30%) 7%

3.1.3.1 Credible plans towards the development of assurance properties (40%) 10%

The framework describes some approaches for assuring that systems have constrained capabilities, such as by restricting autonomy of the model. However, there are no credible plans given towards the development of further assurance processes, such as for misalignment, nor indications that they commit to further research in this area.

Quotes:

“One effective approach to increase the predictability of a hazard is to restrict the scope and use of a model. This is achieved by imposing capability or feature restrictions, such as limiting the types of inputs a model can process. Additionally, models may be explicitly barred from prohibited applications through legal mechanisms such as NVIDIA’s End User License Agreements for foundation models. Another important strategy involves restricting advanced autonomy functions like self-assigning new sub-goals or executing long-horizon tasks, as well as tool-use functionalities like function calls and web browsing.” (p. 11)

3.1.3.2 Evidence that the assurance properties are enough to achieve their corresponding KCI thresholds (40%) 10%

No process for proving that assurance processes are sufficient is detailed. To improve, empirical results of assurance process methods could be used to justify their sufficiency (as well as theoretical results). For instance, they mention that “WMDP benchmark serves as both a proxy evaluation for hazardous knowledge in large language models (LLMs) and a benchmark for unlearning methods to remove such knowledge” suggests we could use the results for unlearning methods on this benchmark to verify their sufficiency (along with other assurance processes). Partial credit is given for this.

Quotes:

“WMDP benchmark serves as both a proxy evaluation for hazardous knowledge in large language models (LLMs) and a benchmark for unlearning methods to remove such knowledge.” (p. 12)

3.1.3.3 The underlying assumptions that are essential for their effective implementation and success are clearly outlined (20%) 0%

There is no mention of the underlying assumptions that are essential for the effective implementation and success of assurance processes.

Quotes:

No relevant quotes found.

3.2 Continuous Monitoring and Comparing Results with Pre-determined Thresholds (50%) 9%

3.2.1 Monitoring of KRIs (40%) 2%

3.2.1.1 Justification that elicitation methods used during the evaluations are comprehensive enough to match the elicitation efforts of potential threat actors (30%) 0%

NVIDIA provides minimal justification that their evaluation methods are comprehensive enough to match threat actor capabilities, though they mention being “committed to conducting comprehensive testing”. While they mention “comprehensive testing to identify [the] model susceptibilities related to systemic risks” and describe red teaming where “expert human operators deliberately probe a frontier AI model’s vulnerability,” this is not linked to risk modelling of what strategies threat actors may choose.

There’s no discussion of fine-tuning models for evaluation purposes, compute resources used for elicitation, or consideration of different threat models (like whether threat actors obtain model weights). The framework lacks detail on evaluation methodology that would demonstrate they’re testing models as rigorously as potential misusers would. Without this justification, their capability assessments may underestimate the true risk potential of their models.

Quotes:

“We’re committed to conducting comprehensive testing to identify our model susceptibilities related to systemic risks” (p. 12)

“In adversarial red teaming, expert human operators deliberately probe a frontier AI model’s vulnerability and attempt to induce it to produce harmful, biased, or disallowed outputs” (p. 13)

“NVIDIA runs and supports the Garak LLM vulnerability scanner. This constantly updated public project collects techniques for exploiting LLM and multi-modal model vulnerabilities” (p. 13)

3.2.1.2 Evaluation frequency (25%) 0%

There is some indication of reviewing evaluation results / risk assessments, and repeating them if “pre-defined thresholds are met”, as well as the importance of running stress-testing/red-teaming “at a relatively high frequency during development phases”. However, these “pre-defined thresholds” and “high frequenc[ies]” do not relate to time intervals nor effective compute used during training.

Partial credit for mentioning that “thorough stress-testing and red-teaming for frontier AI models should be run at a relatively high frequency during development phases and can require a large amount of processing power.”

Quotes:

“MR3 – Risk mitigation measures and evaluation results are documented by engineering teams and periodically reviewed.” (p. 3)

“Accelerated computing on GPUs makes large-scale, high-fidelity testing feasible. Thorough stress-testing and red-teaming for frontier AI models should be run at a relatively high frequency during development phases and can require a large amount of processing power.” (p. 14)

3.2.1.3 Description of how post-training enhancements are factored into capability assessments (15%) 10%

The fact that risk assessments (and thus detailed evaluations) are “periodically reviewed, and repeated if pre-defined thresholds are met e.g. technology matures […]” suggests that implicitly post-training enhancement progress triggers new capability assessments. However, there is no indication of a safety margin, confidence level or forecasting being a factor in capability assessments. An improvement would be to add detail on how they account(ed) for how post-training enhancements’ risk profiles change with different model structures – namely, post-training enhancements are much more scalable with reasoning models, as inference compute can often be scaled to improve capabilities.

Quotes:

3.2.1.4 Vetting of protocols by third parties (15%) 0%

There is no mention of having the evaluation methodology vetted by third parties.

Quotes:

No relevant quotes found.

3.2.1.5 Replication of evaluations by third parties (15%) 0%

There is no mention of evaluations being replicated or conducted by third parties.

Quotes:

No relevant quotes found.

3.2.2 Monitoring of KCIs (40%) 10%

3.2.2.1 Detailed description of evaluation methodology and justification that KCI thresholds will not be crossed unnoticed (40%) 25%

There is mention of redteaming safeguards, to test mitigation effectiveness. More detail is required on how this is systematically monitored such that mitigations assumed sufficient for KCI thresholds (presently or not presently crossed) are indeed continuously proven to be sufficient. To improve, the framework should describe systematic, ongoing monitoring to ensure mitigation effectiveness is tracked continuously such that the KCI threshold will still be met, when required.

Quotes:

“WMDP benchmark serves as both a proxy evaluation for hazardous knowledge in large language models (LLMs) and a benchmark for unlearning methods to remove such knowledge.” (p. 12)

“Red teaming activities are used in conjunction with public benchmarks to address those limitations and capture those emerging risks that cannot be directly measured through benchmarking. In adversarial red teaming, expert human operators deliberately probe a frontier AI model’s vulnerability and attempt to induce it to produce harmful, biased, or disallowed outputs. The red team also probes each guardrail component independently with targeted examples to identify weaknesses and improve performance in edge cases.” (p. 13)

3.2.2.2 Vetting of protocols by third parties (30%) 0%

There is no mention of KCIs protocols being vetted by third parties.

Quotes:

No relevant quotes found.

3.2.2.3 Replication of evaluations by third parties (30%) 0%

There is no mention of red-teaming/stress-testing of safeguards being conducted nor audited by third parties.

Quotes:

No relevant quotes found.

3.2.3 Transparency of evaluation results (10%) 43%

3.2.3.1 Sharing of evaluation results with relevant stakeholders as appropriate (85%) 50%

There is a commitment to sharing “all relevant data from the risk evaluation process”, but not evaluation results specifically. There is a commitment to notifying other developers if identified hazards hold “severe risk”, but not government authorities.

Quotes:

“In cases of severe risk, notifying other developers of identified hazards through the proven channel of NVIDIA’s security bulletin allows for coordinated response efforts, mitigating widespread issues.” (p. 12)

“Our risk evaluation process then estimates the residual risk after controls are applied and compares it against the potential initial risks posed by the AI-based product. Leveraging the results from the risk evaluation phase, it is possible to determine how residual risks correspond with NVIDIA’s Trustworthy AI (TAI) principles and document any trade-offs made during the allocation of risk treatment measures. All relevant data from the risk evaluation process is then stored in our model cards.” (p. 1)

“When developing an AI model, it is important to record assumptions about the intended use case (if any) to provide context around model quality and any known limitations. The output from these assessments are documented in our model cards and supports our customers when safely integrating our models into downstream applications or systems.” (p. 4)

“For this reason, we take a hybrid approach in the risk assessment. We document assumptions and limitations in the model card but also factor in controls that can be applied to the system architecture e.g. recording use, rate limiting, input/output restriction etc.” (p. 5)

3.2.3.2 Commitment to non-interference with findings (15%) 0%

No commitment to permitting the reports, which detail the results of external evaluations (i.e. any KRI or KCI assessments conducted by third parties), to be written independently and without interference or suppression.

Quotes:

No relevant quotes found.

3.2.4 Monitoring for novel risks (10%) 5%

3.2.4.1 Identifying novel risks post-deployment: engages in some process (post deployment) explicitly for identifying novel risk domains or novel risk models within known risk domains (50%) 10%

The potential for risk from a model is somewhat predetermined in their framework by the use case, capabilities and level of autonomy they design the model to have: “Based on the description of a product’s architecture and development workflows it should be possible to identify possible hazards, estimate the level of risk for each hazard and categorize the cumulative risk relative to our trustworthy AI principles.” They do also mention detecting “emergent hazards”, but these are taken from a “a list of potential systemic risks”. They also mention that “The rapid advancement in AI development necessitates continuous monitoring and updating of risk frameworks to stay aligned with emerging capabilities and associated risks” which garners partial credit, but do not describe their implementation of continuous monitoring.

Indeed, they do not describe a process for identifying novel risk models or risk profiles of their models post-deployment. To improve, they could acknowledge that AI systems may have unintended and emerging, unforeseeable risks, requiring a process for identifying these risks even after the full risk assessment pre-deployment has occurred.

Quotes:

3.2.4.2 Mechanism to incorporate novel risks identified post-deployment (50%) 0%

There is no mechanism to incorporate risks identified during post-deployment that is detailed.

Quotes:

No relevant quotes found.

NVIDIA

4. Risk Governance

Weak 23%

4.1 Decision-making (25%) 44%

4.1.1 The company has clearly defined risk owners for every key risk identified and tracked (25%) 50%

The framework states that there are clear roles and responsibilities, but not that there are risk owners or who those are.

Quotes:

“NVIDIA’s internal governance structures clearly define roles and responsibilities for risk management. It involves separate teams tasked with risk management that have the authority and expertise to intervene in model development timelines, product launch decisions, and strategic planning.” (p. 14)

4.1.2 The company has a dedicated risk committee at the management level that meets regularly (25%) 0%

No mention of a management risk committee.

Quotes:

No relevant quotes found.

4.1.3 The company has defined protocols for how to make go/no-go decisions (25%) 50%

The framework outlines clear protocols but does not provide as much detail on decision-making as some of the other companies. It commendably includes well-defined MR (model risk) levels.

Quotes:

“Risk assessments are periodically reviewed, and repeated if pre-defined thresholds are met e.g. technology matures, component is significantly modified, operating conditions change, or a hazard occurs with high severity or frequency. If a product’s MR rating is increased during reassessment, then the new governance measures should be applied before the latest version of the product is released.” (p. 3)
“The level of governance associated with each MR levels can be broadly grouped into the following categories: MR5 – A detailed risk assessment should be complete and approved by an independent committee e.g. NVIDIA’s AI ethics committee. MR4 – A detailed risk assessment should be complete and business unit leader approval is required. MR3 – Risk mitigation measures and evaluation results are documented by engineering teams and periodically reviewed. MR2/MR1 – Evaluation results are documented by engineering teams.” (p. 2)

4.1.4 The company has defined escalation procedures in case of incidents (25%) 75%

The framework describes clear procedures for managing incidents.

Quotes:

“Lowering the duration of a hazard can be achieved by implementing establish robust protocols for managing AI-related incidents, including clear information-sharing mechanisms between developers and relevant authorities. This encourages proactive identification of potential risks before they escalate. Additionally, reducing access to a model reactively when misuse is detected can help limit further harm. This can involve rolling back a model to a previous version or discontinuing its availability if significant misuse risks emerge during production. Lastly, conducting regular safety drills ensures that emergency response plans are stress-tested. By practicing responses to foreseeable, fastmoving emergency scenarios, NVIDIA is able to improve their readiness and reduce the duration of hazardous incidents.” (p. 11)
“In cases of severe risk, notifying other developers of identified hazards through the proven channel of NVIDIA’s security bulletin allows for coordinated response efforts, mitigating widespread issues.” (p. 12)

4.2. Advisory and Challenge (20%) 35%

4.2.1 The company has an executive risk officer with sufficient resources (16.7%) 0%

No mention of an executive risk officer.

Quotes:

No relevant quotes found.

4.2.2 The company has a committee advising management on decisions involving risk (16.7%) 0%

The framework mentions an AI ethics committee, without further detail.

Quotes:

“MR5 – A detailed risk assessment should be complete and approved by an independent committee e.g. NVIDIA’s AI ethics committee.” (p. 2)

4.2.3 The company has an established system for tracking and monitoring risks (16.7%) 75%

The framework mentions a “comprehensive repository of hazards”, “mapped to assets”, suggesting a high-maturity approach.

Quotes:

“NVIDIA has a comprehensive repository of potential hazards that has been carefully curated and mapped to assets to help guide teams to understand potential risks related with their products. This repository has been created using a variety of sources e.g. stakeholder consultation, market data, incident reports.” (p. 7)

4.2.4 The company has designated people that can advise and challenge management on decisions involving risk (16.7%) 50%

The framework shows some evidence of combining contrasting viewpoints.

Quotes:

“While our formal model evaluations provide quantitative data, model reviews and interviews with engineering teams reveal developers’ intuitive understandings, early warning signs of risks, and internal safety practices. This qualitative approach offers a more nuanced perspective on AI capabilities and potential threats.” (p. 15)

4.2.5 The company has an established system for aggregating risk data and reporting on risk to senior management and the Board (16.7%) 10%

The framework mentions briefly keeping “correct stakeholders” informed.

Quotes:

“Establishing consistent communication channels with employees ensures that the correct stakeholders at NVIDIA remain informed about rapid advancements and can promptly address emerging concerns.” (p. 15)

4.2.6 The company has an established central risk function (16.7%) 50%

The framework mentions several teams involved in risk management, although their exact roles are not spelled out.

Quotes:

“This involves separate teams tasked with risk management that have the authority and expertise to intervene in model development timelines, product launch decisions, and strategic planning.” (p. 15)

4.3 Audit (20%) 0%

4.3.1 The company has an internal audit function involved in AI governance (50%) 0%

No mention of an internal audit function.

Quotes:

No relevant quotes found.

4.3.2 The company involves external auditors (50%) 0%

No mention of external audit.

Quotes:

No relevant quotes found.

4.4 Oversight (20%) 0%

4.4.1 The Board of Directors of the company has a committee that provides oversight over all decisions involving risk (50%) 0%

No mention of a Board risk committee.

Quotes:

No relevant quotes found.

4.4.2 The company has other governing bodies outside of the Board of Directors that provide oversight over decisions (50%) 0%

No mention of any additional governance bodies.

Quotes:

No relevant quotes found.

4.5 Culture (10%) 37%

4.5.1 The company has a strong tone from the top (33.3%) 25%

The framework includes mentions of balancing innovation and risk.

Quotes:

“By integrating these processes into their development lifecycle, we can create a governance framework that is both flexible and robust. This enables responsible AI innovation while proactively managing the unique risks posed by frontier models, ensuring safer and more ethical deployment across various industry sectors.” (p. 15)
“We’re committed to conducting comprehensive testing to identify our model susceptibilities related to systemic risks. This proactive approach aims to uncover and mitigate potential risks before public deployment.” (p. 12)

4.5.2 The company has a strong risk culture (33.3%) 75%

The framework explicilty mentions embedding risk-aware practices in daily work.

Quotes:

“This involves embedding risk-aware practices into the daily work of engineers, researchers, and product managers, supported by ongoing training and open dialogue on ethical considerations. While our formal model evaluations provide quantitative data, model reviews and interviews with engineering teams reveal developers’ intuitive understandings, early warning signs of risks, and internal safety practices. This qualitative approach offers a more nuanced perspective on AI capabilities and potential threats. Establishing consistent communication channels with employees ensures that the correct stakeholders at NVIDIA remain informed about rapid advancements and can promptly address emerging concerns. By integrating these processes into their development lifecycle, we can create a governance framework that is both flexible and robust. This enables responsible AI innovation while proactively managing the unique risks posed by frontier models, ensuring safer and more ethical deployment across various industry sectors.” (p. 15)

4.5.3 The company has a strong speak-up culture (33.3%) 10%

The framework mentions “communication channels”, but it is not clear if these provide protection for speaking up.

Quotes:

4.6 Transparency (5%) 37%

4.6.1 The company reports externally on what their risks are (33.3%) 10%

The framework does not make clear what the key risks managed are.

Quotes:

“The output from these assessments are documented in our model cards and supports our customers when safely integrating our models into downstream applications or systems.” (p. 4)
“All relevant data from the risk evaluation process is then stored in our model cards.” (p. 1)

4.6.2 The company reports externally on what their governance structure looks like (33.3%) 50%

The framework states the goals of the governance structure, but does not provide much detail on the governance components.

Quotes:

“Mitigating risks associated with frontier AI models presents a complex governance challenge for any organization, particularly for large companies developing a wide-range of diverse models across multiple industries. The breadth of applications and the dynamic nature of AI technologies make rigid, one-size-fits-all frameworks impractical. Instead, we have adopted a governance approach centered on oversight and adaptive risk management. This strategy allows innovation to flourish while ensuring that development processes remain accountable and transparent. Key to this approach is early detection of potential risks, coupled with mechanisms to pause development when necessary. NVIDIA’s internal governance structures clearly define roles and responsibilities for risk management. It involves separate teams tasked with risk management that have the authority and expertise to intervene in model development timelines, product launch decisions, and strategic planning.” (p. 14)

4.6.3 The company shares information with industry peers and government bodies (33.3%) 50%

The framework clearly states the existence of information-sharing mechanisms, both for other developers and authorities.

NVIDIA

Best in class

Overview

1. Risk Identification

1.1 Classification of Applicable Known Risks (40%) 18%

1.1.1 Risks from literature and taxonomies are well covered (50%) 25%

Quotes:

1.1.2 Exclusions are clearly justified and documented (50%) 10%

Quotes:

1.2 Identification of Unknown Risks (Open-ended red teaming) (20%) 7%

1.2.1 Internal open-ended red teaming (70%) 10%

Quotes:

1.2.2 Third party open-ended red teaming (30%) 0%

Quotes:

1.3 Risk modeling (40%) 14%

1.3.1 The company uses risk models for all the risk domains identified and the risk models are published (with potentially dangerous information redacted) (40%) 25%

Quotes:

1.3.2 Risk modeling methodology (40%) 9%

1.3.2.1 Methodology precisely defined (70%) 0%

Quotes:

1.3.2.2 Mechanism to incorporate red teaming findings (15%) 10%

Quotes:

1.3.2.3 Prioritization of severe and probable risks (15%) 50%

Quotes:

1.3.3 Third party validation of risk models (20%) 0%

Quotes:

2. Risk Analysis and Evaluation

2.1 Setting a Risk Tolerance (35%) 3%

2.1.1 Risk tolerance is defined (80%) 3%

2.1.1.1 Risk tolerance is at least qualitatively defined for all risks (33%) 10%

Quotes:

2.1.1.2 Risk tolerance is expressed at least partly quantitatively as a combination of scenarios (qualitative) and probabilities (quantitative) for all risks (33%) 0%

Quotes:

2.1.1.3 Risk tolerance is expressed fully quantitatively as a product of severity (quantitative) and probability (quantitative) for all risks (33%) 0%

Quotes:

2.1.2 Process to define the tolerance (20%) 0%

2.1.2.1 AI developers engage in public consultations or seek guidance from regulators where available (50%) 0%

Quotes:

2.1.2.2 Any significant deviations from risk tolerance norms established in other industries is justified and documented (e.g., cost-benefit analyses) (50%) 0%

Quotes:

2.2 Operationalizing Risk Tolerance (65%) 16%

2.2.1 Key Risk Indicators (KRI) (30%) 15%

2.2.1.1 KRI thresholds are at least qualitatively defined for all risks (45%) 10%

Quotes:

2.2.1.2 KRI thresholds are quantitatively defined for all risks (45%) 0%

Quotes:

2.2.1.3 KRIs also identify and monitor changes in the level of risk in the external environment (10%) 10%

Quotes:

2.2.2 Key Control Indicators (KCI) (30%) 6%

2.2.2.1 Containment KCIs (35%) 13%

2.2.2.1.1 All KRI thresholds have corresponding qualitative containment KCI thresholds (50%) 25%

Quotes:

2.2.2.1.2 All KRI thresholds have corresponding quantitative containment KCI thresholds (50%) 0%

Quotes:

2.2.2.2 Deployment KCIs (35%) 5%

2.2.2.2.1 All KRI thresholds have corresponding qualitative deployment KCI thresholds (50%) 10%

Quotes:

2.2.2.2.2 All KRI thresholds have corresponding quantitative deployment KCI thresholds (50%) 0%

Quotes:

2.2.2.3 For advanced KRIs, assurance process KCIs are defined (30%) 0%

Quotes:

2.2.3 Pairs of thresholds are grounded in risk modeling to show that risks remain below the tolerance (20%) 25%

Quotes:

2.2.4 Policy to put development on hold if the required KCI threshold cannot be achieved, until sufficient controls are implemented to meet the threshold (20%) 25%

Quotes:

3. Risk Treatment

3.1 Implementing Mitigation Measures (50%) 19%

3.1.1 Containment measures (35%) 30%

3.1.1.1 Containment measures are precisely defined for all KCI thresholds (60%) 50%

Quotes:

3.1.1.2 Proof that containment measures are sufficient to meet the thresholds (40%) 0%

Quotes:

3.1.1.3 Strong third party verification process to verify that the containment measures meet the threshold (100% if 3.1.1.3 > [60% x 3.1.1.1 + 40% x 3.1.1.2]) 0%

Quotes:

3.1.2 Deployment measures (35%) 19%

3.1.2.1 Deployment measures are precisely defined for all KCI thresholds (60%) 25%

Quotes:

3.1.2.2 Proof that deployment measures are sufficient to meet the thresholds (40%) 10%

Quotes:

3.1.2.3 Strong third party verification process to verify that the deployment measures meet the threshold (100% if 3.1.2.3 > [60% x 3.1.2.1 + 40% x 3.1.2.2]) 0%

1.3.1 The company uses risk models for all the risk domains identified and the risk models are published (with potentially dangerous information redacted) (40%) 25%