Methodology

Who and What we rate

Company Selection

At the AI Seoul Summit, multiple companies pledged to publish safety frameworks by February 2025, detailing how they manage risks when developing and deploying frontier AI models and systems. We have assessed the 12 companies that fulfilled this commitment: Amazon, Anthropic, Cohere, G42, Google DeepMind, Magic, Meta, Microsoft, Naver, NVIDIA, OpenAI, and xAI.

Without published frameworks, it is difficult to assess whether companies demonstrate any risk management practices. Six companies – 01.AI, Inflection AI, Minimax, Mistral AI, Technology Innovation Institute, and Zhipu AI – have not published their frameworks despite their commitments. Additionally, IBM summarized its existing AI governance practices without addressing severe AI risks specifically, while Samsung published a framework limited to on-device AI safety, neither of which met our criteria for frontier AI risk management evaluation. These approaches fall outside the scope of frontier AI risk management.

Other frontier AI companies, such as Alibaba and DeepSeek, have not signed the Seoul commitments and have not published frontier safety frameworks. While we recognize these companies are developing advanced AI capabilities, we cannot assess their risk management practices without published frameworks. We will update our ratings to include any company that publishes a frontier safety framework in the future.

Assessment Scope

We exclusively assess companies' frontier safety frameworks. This represents a significant shift from our first iteration, where we evaluated all publicly available documents (frontier safety frameworks, model cards, and research publications). By focusing solely on frontier safety frameworks, we ensure a more systematic, apples-to-apples comparison across all companies.

Another factor that contributed to this decision is that frameworks represent formal commitments to ongoing risk management practices, while model cards and research papers demonstrate point-in-time implementations. A single exemplary safety measure documented in a model card, while valuable, does not constitute the sustained, systematic approach we seek to evaluate. Additionally, including model cards would make systematic assessment more difficult, as their depth and detail vary significantly even across different models from the same company.

CRITERIa

Our ratings are structured around a comprehensive risk management framework we developed specifically for frontier AI development.

This framework integrates established risk management principles with emerging AI-specific practices across four key dimensions:

Risk identification – through literature review, open-ended red-teaming, and risk modeling
Risk analysis and evaluation – using quantitative metrics and clearly defined thresholds
Risk treatment – implementing mitigation measures including containment, deployment controls, and assurance processes
Risk governance – establishing clear organizational structures and accountability

From this framework, we derived 65 independent criteria across these four dimensions, organized hierarchically. Each criterion is assessed on a 0 to 100 point scale for all 12 companies. To calculate aggregate scores, we assign weights to each criterion based on its importance for effective risk management. The weights are derived from our expert knowledge of AI risk management and our review of risk management practices in other industries. This is further explained and represented in our risk management framework.

A few criteria use alternative scoring methods beyond weighted averages. We use this when companies can demonstrate strong accountability in different ways, such as specific pre-commitments, or via robust third-party verification instead. In such cases, we take the maximum score between these complementary approaches, recognizing that different paths can achieve the same risk management objectives.

Below, we detail each criterion along with its checklist of required elements for achieving a high grade and its corresponding weight in our assessment. For each company, we write both our rationale for giving the score, as well as areas for improvement to suggest next steps.

updated methodology

We've made several significant improvements to enhance the comprehensiveness and clarity of our ratings:

Expanded coverage: We now assess all 12 companies that have published frontier safety frameworks, up from 6 companies in our previous edition.
Focused methodology: Our assessment now exclusively evaluates frontier safety frameworks, ensuring systematic, apples-to-apples comparisons across all companies. Previously, we reviewed all publicly available documents, which made direct comparisons with a large number of companies more challenging.
Enhanced granularity: We've developed 65 independent criteria across four dimensions, replacing our previous approach of 7 aggregated categories. This provides more precise and transparent insights into each company's risk management practices.
Comprehensive framework: We've introduced a dedicated Risk Governance dimension, as robust organizational structures and accountability mechanisms are essential to implementing effective risk management.

scale and justification

Assessment Scale

We score each criterion on a 0-100% scale using a uniform grading system. Our scale provides finer gradations at the extremes (10% increments) and broader intervals in the middle range (25% increments). This reflects the reality that distinguishing between complete absence and minimal presence (or near-perfection and perfection) requires precision, while mid-range distinctions are inherently more coarse. The scale we use is as follows:

0% - The criterion is not mentioned at all. There is no evidence that the company has considered or addressed this aspect.

10% - The criterion is barely acknowledged with minimal reference. The company shows awareness of the concept but provides almost no details about implementation or planning.

25% - The criterion is partially addressed with limited information. There is some evidence that the company has started thinking about implementation, but details are sparse and underdeveloped.

50% - The criterion is moderately addressed with adequate information. There is evidence of partial implementation with a structured approach, though important gaps remain.

75% - The criterion is well addressed with substantial detail. Implementation appears thorough with minor gaps remaining. The approach demonstrates expertise and careful consideration of most key aspects.

90% - The criterion is addressed excellently with comprehensive detail. Implementation appears complete, robust, and mature. The approach shows mastery of the subject with attention to nuances and edge cases.

100% - The criterion is fulfilled to the highest possible standard. Implementation is exemplary, representing best practices in the industry. All aspects are addressed with exceptional depth, rigor, and forward-thinking.

Justifying Each Scoring

Every score includes a detailed rationale accompanied by specific relevant quotes from the company's framework, referenced with page numbers. Each rationale explains why the company received its grade and identifies concrete improvements that could be implemented to achieve a higher score.

For instance, “Risk modeling is clearly conducted for each risk domain. The list of threat scenarios are published for each risk domain, whilst keeping generality for security reasons. There is a clear reliance on risk modeling for determining whether the model may pose novel risks. To improve, more detail could be published on the risk models, including causal pathways (with sensitive information redacted.) This is to show evidence of risk modeling and to allow scrutiny from experts. Detail on the methodology and experts involved should also be published."

Glossary

Assurance processes: Processes that can provide affirmative safety assurance of an AI model once the model has dangerous capabilities.
Audit: Process by which independent (internal or external) evaluations are conducted to verify the effectiveness, accuracy, and compliance of the risk management framework and its measures.
CBRN Weapons: Chemical, Biological, Radiological and Nuclear Weapons. In the context of AI risk management, used to discuss the potential for AI systems to be misused in the development of high consequence weapons.
Capabilities thresholds: Defined levels of an AI system's performance or capabilities that, when reached, require implementation of specific mitigation measures to prevent risk exceeding established risk tolerance.
Containment measures: Mitigation strategies focused on controlling access to the AI system.
Deployment measures: Risk mitigations that allow controlling the potential for misuse of the model in dangerous domains and its propensity to cause accidental risks.
Key Control Indicator (KCI): Measurable targets representing the effectiveness of mitigation measures.
Key Risk Indicator (KRI): Measurable signals that act as proxies for risk. KRIs help monitor risk levels in the system and serve as triggers for when additional mitigations should be applied.
Open-ended red teaming: A form of red teaming that aims at discovering unforeseen risk or risk factors, by not restricting exploration to predefined risks.
Red teaming: A practice where experts challenge and probe an AI system to identify vulnerabilities and potential risks, designed to mimic adversarial or unforeseen conditions.
Risk analysis and evaluation: A phase where risks are assessed by setting a risk tolerance and translating it into measurable indicators. This process involves determining the probability and severity of risks and prioritizing them for further action.
Risk governance: A system of rules, processes and practices that define how an organization makes decisions regarding risk management. It covers the allocation of responsibilities, decision rights, oversight mechanisms, and procedures for external transparency and reporting.
Risk identification: The process of recognizing and categorizing potential hazards, risk sources and scenarios, including both known and unknown risks.
Risk modeling: A process of constructing detailed, step by step scenarios that describe how identified risks might materialize into real-world harm.
Risk register: A central, continuously updated document that tracks all identified risks, including information such as risk owners, risk levels, associated KRIs and KCIs, and action plans for mitigations.
Risk tolerance: The aggregate level of risk that society or AI developers is willing to accept.

RISK identification (25%)

This dimension assesses how thoroughly the company has addressed known risks in the literature and engaged in open-ended red teaming to uncover potential novel threats. It also evaluates whether the AI company has leveraged a diverse range of risk identification techniques, including threat modeling when appropriate, to develop a deep understanding of possible risk scenarios.

1.1 Classification of Applicable Known Risks Using Taxonomies and the Literature (40%)

1.1.1 The risks from literature and taxonomies are well covered (50%)

The framework covers the main risks in the literature (Examples include Weidinger et al. 2022, Hendrycks et al. 2023, or Slattery et al. 2024.) For chatbot LLMs, these risks include: cybersecurity; chemical, biological, nuclear and radiological risk; manipulation/persuasion; autonomous AI R&D; loss of control risks, OR if not all risks are covered, 1.1.2 is greater than 50%
An effort is made to further break down "loss of control risks", e.g. into instrumental reasoning / autonomous replication / deception and scheming.
Further credit is given if the framework references taxonomies or literature that informs their risk identification process, or some other justification for how they selected risk domains to show they do not miss risk domains experts highlight

1.1.2 If some risks are not covered, this decision is clearly justified and documented (50%)

Either all risks in 1.1.1 are included, and "loss of control risks" is further broken down, or:

Any risks not included out of the list in 1.1.1 have strong justification for their exclusion. This justification refers to at least one of: academic literature/scientific consensus; internal threat modeling with transparency; third-party validation, with named expert groups and reasons for their validation.
Further credit is given for identifying categories that are not currently monitored but could be monitored in the future, along with the precise criteria these categories must meet to qualify for monitoring.

1.2 Identification of Unknown Risks (Open-ended red teaming) (20%)

1.2.1 Internal open-ended red teaming is conducted with adequate expertise and methodology (includes resources, time, and access to the model) (70%)

The framework commits to an internal process dedicated to identifying unknown risks that could arise from the model. This process occurs pre-deployment for frontier models. This process could identify either novel risk domains, or novel risk models/changed risk profiles within pre-specified risk domains (e.g. emergence of an extended context length allowing improved zero shot learning changes the risk profile).
The framework gives detail on the resources, time and access given to the internal open-ended red team, and justification for why this is adequate.
The framework gives detail on the expertise required to properly identify hazards, and details why their internal open-ended red team meets this criteria.

1.2.2 External open-ended red teaming conducted with adequate expertise and methodology (includes resources, time, and access to the model) (30%)

The framework commits to an external process dedicated to identifying unknown risks that could arise from the model. This process occurs pre-deployment. This process could identify either novel risk domains or novel risk models within pre-specified risk domains.
The framework gives detail on the resources, time and access given to the external open-ended red team, and justification for why this is adequate.
The framework gives detail on the expertise required to properly identify hazards, and details why their external open-ended red team meets this criteria.

1.3 Risk Modeling (40%)

1.3.1 The company uses risk models for all the risk domains identified and the risk models are published (with potentially dangerous information redacted) (40%)

For each considered risk domain, the company has developed a risk model and published it, with potentially dangerous information redacted.
There is justification that adequate effort has been exerted to systematically map out all possible risk pathways.
The following are also published: risk modeling methodology, experts involved, list of identified scenarios.

1.3.2 Risk modeling methodology (40%)

1.3.2.1 Methodology is precisely defined (70%)

The framework commits to a structured process for risk modeling.
This structured process has a precise methodology to systematically explore potential risk models, such as event trees, fault trees, or Fishbone diagrams.
The risk models use this methodology to break down complex risk pathways into discrete, measurable steps.

1.3.2.2 Mechanism to incorporate risks identified during open-ended red teaming or evaluations (15%)

Novel risks or risk pathways identified via open-ended red teaming or any other evaluations trigger further risk modeling and scenario analysis. This may include updating multiple or all risk models. For instance, encountering evidence of instrumental reasoning via open-ended red teaming likely requires updates to multiple risk models.

1.3.2.3 Assignment and prioritization of risk models representing severe and probable harms (15%)

Risk models, from the full space of scenarios, are assigned severities (quantitative, semi-quantitative (i.e. confidence intervals) or qualitative) and probabilities, based on the best estimates at the time, if mitigations are not undertaken.
The severity/probability scores are published.
The most severe x probable risk models are prioritised as focus areas.

1.3.3 Third party validation of risk models (20%)

Risk models are reviewed by independent third parties with relevant expertise. By review, we mean third parties are accountable for the opinions they give on the final risk models.
If risk models are not reviewed externally, justification for internal expertise or lack of external expertise is given.

risk analysis and evaluation (25%)

This dimension assesses whether the company has established well-defined risk tolerances that precisely characterize acceptable risk levels for each identified risk. Moreover, this dimension examines if the company has successfully operationalized these tolerances into measurable criteria: Key Risk Indicators (KRIs) that signal when risks are approaching critical levels, and Key Control Indicators (KCIs) that demonstrate the effectiveness of mitigation measures. The assessment captures whether companies define these indicators in paired "if-then" relationships, where exceeding KRI thresholds triggers corresponding KCI requirements. This operationalization ensures that abstract risk tolerances translate into concrete, actionable metrics that guide day-to-day decisions and maintain risks within acceptable bounds.

2.1 Setting a Risk Tolerance (35%)

2.1.1 Risk tolerance is defined (80%)

2.1.1.1 Risk tolerance is at least qualitatively defined for all risks (33%)

The framework clearly and explicitly sets out a risk tolerance, i.e., the maximum amount of risk the company is willing to accept, for each risk domain (though they need not differ between risk domains). For example, this could be expressed as economic damage for cybersecurity risks and as number of fatalities for chemical and biological risks.
This risk tolerance may be qualitative, e.g. a scenario.

2.1.1.2 Risk tolerance is expressed fully quantitatvely or at least prtly quantitatively as a combination of scenarios (qualitative) and probabilities (quantitative) for all risks (33%)

The risk tolerance for each risk domain has quantitative probabilities.

2.1.1.3 Risk tolerance is expressed fully quantitatively as a product of severity (quantitative) and probability (quantitative) for all risks (33%)

The risk tolerance for each risk domain is fully quantitative, as a product of severity and probability.
Credit is given if the same risk tolerance applies across all risk domains.

2.1.2 Process to define the tolerance (20%)

2.1.2.1 AI developers engage in public consultations or seek guidance from regulators where available (50%)

There is a structured process for seeking public input into risk tolerances.
There are some conditions under which input from regulators into risk tolerances is required.

2.1.2.2 Any significant deviations from risk tolerance norms established in other industries is justified and documented (e.g., cost-benefit analyses) (50%)

If risk tolerance is higher than in other industries (such as nuclear or aviation), justification is given, such as through a cost-benefit analysis which shows why benefits appropriately offset excess risk.

2.2 Operationalizing Risk Tolerance (65%)

2.2.1 Key Risk Indicators (KRIs) (30%)

2.2.1.1 KRI thresholds are at least qualitatively defined for all risks (45%)

For each risk domain and corresponding risk tolerance, at least one KRI is qualitatively defined. This KRI is not necessarily the risk tolerance itself, but a proxy for the risk tolerance that can be measured pre-deployment to indicate that the risk level may exceed the risk tolerance without further mitigation. It is grounded from risk modeling and is appropriate for giving signal on the level of risk for the given risk model.
The threshold is precise enough to provide a clear signal on the level of risk.
The KRI should map to the evaluations being performed, to reduce discretion as much as possible.

2.2.1.2 KRI thresholds are quantitatively defined for all risks (45%)

For each risk domain and corresponding risk tolerance, at least one KRI threshold is quantitatively defined. This KRI is not the risk tolerance itself, but a proxy for the risk tolerance that can be measured pre-deployment to indicate that the risk level may exceed the risk tolerance without further mitigation.
Credit is given if the KRI is measurable enough to be quantitative, e.g. it is a benchmark but no threshold is yet given. This is because KRIs which are measurable are preferred to KRIs which are more vague, assuming both are grounded in risk modeling.

2.2.1.3 KRIs also identify and monitor changes in the level of risk in the external environment (10%)

Where reasonable, KRIs include measurable indicators of risk beyond model capabilities evaluations, such as increased use of AI for successful cyberattacks, the population the AI is deployed to, or improved scaffolding, as a trigger for suitable KCIs.

2.2.2 Key Control Indicators (KCIs) (30%)

2.2.2.1 Containment KCIs (35%)

2.2.2.1.1 All KRI thresholds have corresponding qualitative containment KCI thresholds (50%)

For each KRI threshold, a qualitative containment KCI threshold is described such that if the KRI threshold is reached, then this KCI must be satisfied. An example is security levels.
The KCI is not necessarily the implementation of a specific mitigation, but rather some measurable threshold that mitigations altogether must satisfy to reduce risk sufficiently below the risk tolerance.

2.2.2.1.2 All KRI thresholds have corresponding quantitative containment KCI thresholds (50%)

For each KRI threshold, a quantitative containment KCI threshold is described such that if the KRI threshold is reached, then this KCI must be satisfied.
This can be thought of as the bar that containment measures must meet to keep residual risk below the risk tolerance, given a KRI is crossed.

2.2.2.2 Deployment KCIs (35%)

2.2.2.2.1 All KRI thresholds have corresponding qualitative deployment KCI thresholds (50%)

For each KRI threshold, a qualitative deployment KCI threshold is described such that if the KRI threshold is reached, then this KCI must be satisfied. This can be thought of as the bar that deployent measures must meet to keep residual risk sufficiently below the risk tolerance, given a KRI is crossed.

2.2.2.2.2 All KRI thresholds have corresponding quantitative deployment KCI thresholds (50%)

For each KRI threshold, a quantitative deployment KCI threshold is described such that if the KRI threshold is reached, then this KCI must be satisfied.

2.2.2.3 For advanced KRIs, assurance process KCIs are defined (30%)

For each advanced KRI threshold, an assurance process KCI threshold is described such that if the KRI threshold is reached, then this KCI must be satisfied. A KRI threshold is advanced if the associated risk model is a result of the AI's actions, rather than a human misusing the AI.
This can be thought of as the bar that assurance processes must meet to keep residual risk below the risk tolerance, given a KRI is crossed.

2.2.3 Pairs of thresholds are grounded in risk modeling to show that risks remain below the tolerance (20%)

Altogether, for each risk domain, the company provides justification that each KRI-KCI pairing is sufficient to keep residual risk below the risk tolerance, given the KRI threshold is crossed but the KCI is satisfied. The justification has some quantified confidence level (and a possible safety margin, to allow for error).
Reasoning behind this confidence is given via risk modeling, akin to an adequately quantified safety case, combining both empirical evidence and argumentation. Discrete, measurable steps are combined to show (qualitatively or quantitatively) that residual risk is sufficiently below the risk tolerance.
The assessment of the adequacy of this pairing should not be relative to other companies' risk tolerance.

2.2.4 Policy to put development on hold if the required KCI threshold cannot be achieved, until sufficient controls are implemented to meet the threshold (20%)

For any KRI/KCI pair, if the KCI threshold cannot be shown to have been satisfied, then the company commits to pausing development and/or deployment until sufficient controls are implemented to meet this threshold.
The company details processes, or credible plans to develop processes, for pausing development before unacceptable risk levels are manifest.
The company details processes for dedeployment.

risk treatment (25%)

This dimension evaluates the extent to which the company has implemented comprehensive risk mitigation strategies across three critical areas: containment (controlling access to AI models), deployment (preventing misuse and accidental harms), and assurance processes (providing affirmative evidence of safety). Additionally, it assesses whether the company continuously monitors both Key Risk Indicators (KRIs) and Key Control Indicators (KCIs) throughout the AI system's lifecycle, from training through deployment.

3.1 Implementing Mitigation Measures (50%)

3.1.1 Containment measures (35%)

3.1.1.1 Containment measures are precisely defined for all KCI thresholds (60%)

The planned or already implemented containment measures are precisely defined for all containment KCI thresholds.

3.1.1.2 Proof that containment measures are sufficient to meet the KCI thresholds (40%)

Proof is provided to justify that the containment measures provided are sufficient to meet the relevant containment KCI prior to their implementation (i.e., before the corresponding KRI threshold is crossed.) That is, the containment measures suggested are shown to actually satisfy the relevant containment KCI threshold, or at least evidence for why they believe these measures are likely to satisfy the KCI threshold is given with confidence levels.
Partial credit is given if there exists a process for soliciting proof. However, to gain marks over 50, the first item should be satisfied.
The implementation of the KRI-KCI pairing is predictable in advance, leaving as little to discretion as possible.
The sufficiency criteria is determined ex ante. There is a justification for why this criteria is adequate proof.

3.1.1.3 Strong third-party verification process to verify that the containment measures meet the KCI thresholds (100% if greater than the weighted average of 3.1.1.1 and 3.1.1.2)

There is an external structured process for proving that containment measures are sufficient to meet the relevant containment KCI, such as through a security audit, prior to its implementation (i.e., before the corresponding KRI threshold is crossed).
Detail is provided on how experts are chosen, with the following details: required expertise from experts and guarantee of independence
External reports are made available (with sensitive information redacted), to give a sense of the third parties confidence that the measures meet the threshold.

3.1.2 Deployment measures (35%)

3.1.2.1 Deployment measures are precisely defined for all KCI thresholds (60%)

The planned or already implemented deployment measures are precisely defined for all deployment KCI thresholds.

3.1.2.2 Proof that deployment measures are sufficient to meet the KCI thresholds (40%)

There is a pre emptive justification that the measures are sufficient to meet the relevant deployment KCI (e.g., this quantity of rejection finetuning would enable us to reach our target of 99.9 percent of jailbreak resistance, as shown by these experiments [...]) Partial credit if there is some process for soliciting such evidence.
Partial credit is given if there exists a process for soliciting proof. However, to gain marks over 50, the first item should be satisfied.
The implementation of the KRI-KCI pairing is predictable in advance, leaving as little to discretion as possible.
The sufficiency criteria is determined ex ante. There is a justification for why this criteria is adequate proof.

3.1.2.3 Strong third-party verification process to verify that the containment measures meet the KCI thresholds (100% if greater than the weighted average of 3.1.2.1 and 3.1.2.2)

There is an external structured process for proving that deployment measures are sufficient to meet the relevant deployment KCI, such as external red-teaming of safeguards.
Detail is provided on how experts are chosen, with the following details: required expertise from experts and guarantee of independence
External reports are made available (with sensitive information redacted), to give a sense of the third parties confidence that the measures meet the threshold.

3.1.3 assurance processes (30%)

3.1.3.1 Credible plans towards the development of assurance properties (40%)

Frameworks must acknowledge whether current assurance processes are insufficient to meet the required assurance process KCI.
If insufficient, the framework should articulate (a) at what KRI the assurance processes become necessary, and (b) justification for why they believe they will have sufficient assurance processes by the time the relevant KRI is reached, including (c) technical milestones and estimates of when these milestones will need to be reached given forecasted capabilities growth
If no plans are given, 3.1.3.2 must be at least 50%.

3.1.3.2 Evidence that the assurance processes are sufficient to achieve their corresponding KCI thresholds (40%)

Process is defined for how they will empirically determine/collect evidence to show that assurance processes are credible, such as via demonstration with model organisms or theoretical proofs like Irving et al. (2023)

3.1.3.3 The underlying assumptions that are essential for assurance processes to be sufficient in 3.1.3.2 are clearly outlined, and justified (20%)

The framework outlines the key technical assumptions underlying their current planned assurance approach. For instance, no undetected sandbagging, chain of thought is faithful, etc.
There is justification for why these assumptions are valid, i.e. stress testing of these assumptions is performed and results are presented with confidence levels.

3.2 Continuous Monitoring and Comparing Results with Pre-determined Thresholds

3.2.1 Monitoring of kris (40%)

3.2.1.1 Justification that elicitation methods used during the evaluations are comprehensive enough to match the elicitation efforts of potential threat actors (30%)

For KRIs that require capability assessments (evaluations), the framework provides justification for why their elicitation methods will upper bound the elicited capabilities given the resources and methods available to potential threat actors, for the relevant risk model for that KRI. For instance, evaluations may be performed on a model that is finetuned for the task, or the helpful only model, to account for the case that threat actors attain weights / remove the effect of sandbagging and upper bound capabilities.
There is detail on these elicitation methods, e.g. the amount of compute used for finetuning.

3.2.1.2 Specification of evaluation frequency is given in measurable, predictable terms (effective compute variation and fixed time periods) (25%)

The framework outlines some process for regularly re-conducting evaluations with maximal elicitation, to factor in previously unaccounted post training enhancements or elicitation methods.
The frequency should be based on both time (e.g., every 6 months) and according to scaling progress (effective computing power used in training triggers more advanced KRIs).
The chosen frequency is justified with clear reasoning.

3.2.1.3 Description of how post-training enhancements are factored into capability assessments (15%)

For KRIs that require capability assessments (evaluations), there is an explicit documentation of (a) the specific methodologies used to either incorporate post-training enhancements into capability measurements, and/or (b) the size of the safety/uncertainty margin in order to account for possible post-training enhancements that occur after evaluation is complete, with justification for the size of this margin based on forecasting exercises given the speed of progress of previous post-training enhancements.
The uncertainty margin accounts for, or updates on, how post-training enhancements change with different model structures – namely, post-training enhancements are much more scalable with reasoning models, as inference compute can often be scaled to improve capabilities.

3.2.1.4 Vetting of KRI assessment protocols by third parties (15%)

There is a process for independent third parties to review the internal methods for assessing KRI status, including evaluation methodologies.
Detail is provided on how experts are chosen, with the following details: required expertise from experts and guarantee of independence

3.2.1.5 KRI assessments are conducted, replicated or audited by third parties (15%)

There is a process for assessing KRI assessment results externally (i.e., by independent third parties), to ensure that KRI assessments are accurate. This could materialise as internal KRI assessments being replicated by external parties (or audited), or KRI assessments being outsourced to third parties.

3.2.2 monitoring of kcis (40%)

3.2.2.1 Detailed description of safeguard efficacy methodology, with empirical justification that the KCI measures will continue to satisfy KCI thresholds (40%)

The framework describes systematic, ongoing monitoring to ensure mitigation effectiveness is tracked continuously such that the KCI threshold will be met, when required.
There is a justification that threshold detection will fit within suitable confidence levels. The framework includes failure mode analysis or some other methodology to minimise chance of failure.

3.2.2.2 Vetting of KCI assessment protocols by third parties (30%)

There is a process for independent third parties to review the methods for assessing the efficacy of KCi measures.

3.2.2.3 KCI assessments are conducted, replicated or audited by third parties (30%)

There is a process for assessing KCIs internally and externally (i.e., by independent third parties), to ensure that KCI assessments are accurate. This means either internal KCI assessments are replicated by external parties (audited), or KCI assessments are outsourced to third parties.
Detail is provided on how experts are chosen, with the following details: required expertise from experts, and guarantee of independence.

3.2.3 transparency of kri/kci assessment results (10%)

3.2.3.1 Sharing of KRI and KCI assessment results with relevant stakeholders as appropriate (85%)

If a KRI is crossed for any risk domain, the company commits to notifying regulators/the relevant government authorities in a timely manner.
All KRI and KCI assessments (i.e., evaluations) are public, with predefined criteria.

3.2.3.2 Commitment from the company to not interfere with nor suppress external KRI/KCI assessments' findings (15%)

The framework commits to permitting the reports, which detail the results of external evaluations (i.e. any KRI or KCI assessments conducted by third parties), to be written independently.

3.2.4 monitoring for novel risks (10%)

3.2.4.1 Identifying novel risks post-deployment: engages in some process (post deployment) explicitly for identifying novel risk domains or novel risk models within known risk domains (50%)

There is a structured process for identifying novel risks domains or novel risk models within known risk domains.
There is justification for why this process will identify novel risks.

3.2.4.2 Mechanism to incorporate novel risks identified post-deployment (50%)

Novel risks or risk pathways identified via monitoring post-deployment trigger further risk modeling and scenario analysis. This may include updating multiple or all risk models. (For instance, encountering evidence of instrumental reasoning via open-ended red teaming likely requires updates to multiple risk models.)

risk governance (25%)

This dimension examines whether the company has built robust organizational infrastructure to support effective risk management decision-making. The assessment captures the extent to which companies have established clear risk ownership and accountability, independent oversight mechanisms, and cultures that prioritize safety alongside innovation. Moreover, this dimension evaluates the company's commitment to transparency, specifically their public disclosure of risk management approaches, governance structures, and safety incidents. The evaluation considers how well the company's governance framework ensures that risk considerations are incorporated into strategic decisions and that multiple layers of review prevent any single point of failure in risk management.

4.1 Decision-making (25%)

4.1.1 The company has clearly defined risk owners for every tracked risk domain (25%)

The framework ideally specifies who is the ultimate owner of each risk covered by the framework, or, at a minimum, indicated that responsible executives have been designated as risk owners.

4.1.2 The company has a dedicated risk committee at the management level that meets regularly (25%)

The company ideally has a specific body that is designated as the decision-making body for risk matters. At a minimum, there should be references to executives making risk decisions in a structured manner.

4.1.3 The company has defined protocols for how to make go/no-go decisions (25%)

The framework contains a clear description of how key development and deployment decisions are made and on what basis.

4.1.4 The company has defined escalation procedures in case of incidents (25%)

The framework contains detailed descriptions of the actions that will be taken in case of an incident, including harm reduction and information sharing.

4.2 Advisory and Challenge (20%)

4.2.1 The company has an executive risk officer with sufficient resources (17%)

The framework includes a management role that performs advisory and oversight. In order to maintain independence, this executive should be responsible for the risk management process running appropriately, but not be the risk owner, which is the domain of management. Importantly, they must also have the relevant staffing to execute on their responsibilities.

4.2.2 The company has a committee advising management on decisions involving risk (17%)

The framework includes a specific governance body that has sufficient risk expertise, that meets regularly and advises management on risk decisions.

4.2.3 The company has an established system for tracking and monitoring risks (17%)

The framework includes a system for monitoring and tracking risk levels over time. This should ideally be through a risk dashboard or equivalent where all risk information is aggregated.

4.2.4 The company has designated people that can advise and challenge management on decisions involving risk (17%)

The framework references people in the organization with risk expertise that can provide challenge over management's decisions when it comes to matters of risk.

4.2.5 The company has an established system for aggregating risk data and reporting on risk to senior management and the Board (17%)

The framework clearly outlines what risk information is provided to the Board and senior management on a regular basis and its format and cadence.

4.2.6 The company has an established central risk function (17%)

The framework includes a central risk team that coordinates and manages all risk management processes.

4.3 Audit (20%)

4.3.1 The company has an internal audit function involved in AI governance (50%)

The framework includes a specific governance entity providing independent assurance, typically an internal audit function. It should be empowered to conduct independent reviews of risks and controls.

4.3.2 The company involves external auditors (50%)

The framework references reviews of risks, controls and adherence to the framework from external experts, or explicitly, the use of an external audit firm. These should ideally be fully independent as well as performed with sufficient access.

4.4 Oversight (20%)

4.4.1 The Board of Directors of the company has a committee that provides oversight over all decisions involving risk (50%)

The framework includes a specific governance entity at the Board of Directors level. This should ideally be a "risk committee" specifically focused on risk matters, but can also be an "audit committee" or other designated committee. At a minimum, the framework should include references to an active role played by the Board.

4.4.2 The company has other governing bodies outside of the Board of Directors that provide oversight over decisions (50%)

The framework includes specific oversight entities outside the Board of Directors. This can be a Trust, a Council or similar and should have a clear description of its role and responsibilities.

4.5 Culture (10%)

4.5.1 The company has a strong tone from the top (33%)

The framework should include language that makes clear that the company has a strong commitment to managing the risks that may result from its development and deployment of LLMs.

4.5.2 The company has a strong risk culture (33%)

The framework should include either a commitment to building a strong risk culture or the components that contribute to a strong risk culture, such as risk training, safety drills, continuous updates of the framework, etc.

4.5.3 The company has a strong speak-up culture (33%)

The framework should include clear whistleblowing procedures as well as a commitment to maintain a culture of employees speaking up on matters of non-compliance.

4.6 Transparency (5%)

4.6.1 The company reports externally on what their risks are (33%)

The framework should include a commitment to communicate which risks their models pose externally as well as details on what information will be provided on those risks.

4.6.2 The company reports externally on what their governance structure looks like (33%)

The framework should be explicit in its description of the governance bodies that the company has in place and how they interact. If the company uses a framework such as the Three Lines of Defense or other governance framework, that should be called out. Extra credit is provided if the framework has a distinct section on governance.

4.6.3 The company shares information with industry peers and government bodies (33%)

The framework should outline the kind of information that will be shared externally and with who (government/industry fora/etc).

FAQ

What do these ratings achieve?

Our main concern is to push for transparency and accountability as AI progresses, and we view these ratings as a good first step in achieving that goal. At this stage there is no private actor we would feel comfortable moving forward and developing AI systems for the next few years without substantial overhaul – our ratings make that clear and incentivize change.

Won't these ratings encourage safety washing?

This is a challenge we’ve given a lot of thought to. Without compliance reviews of their frameworks, commitments may not amount to actual action. However, we believe commitments are the starting point. This is why we give concrete points of how companies can improve their practices for every criterion. Moreover, to avoid the gaming of our ratings, we may update the scale over time as industry practices mature.

Won’t this risk management framework prevent us from reaping the benefits of AI?

This risk management framework is designed to favor transparency from AI companies. It is designed to ensure that democratically chosen risk preferences are respected through AI development & deployment, accounting for the benefits. As such, it should incentivize AI companies to develop technologies that have the highest chance of delivering AI benefits safely. Moreover, greater transparency from foundation model providers simplifies downstream deployment, accelerating adoption across the ecosystem.

Do you want corporate governance to substitute for regulation?

We want AI companies to improve their transparency and expect that to be complementary to existing regulations and to favor the development of future adequate regulations, providing well-needed data to encourage reasonable trade-offs when designing future rules.

What does a 5/5 rating mean?

A 5/5 rating indicates that a company has implemented risk management practices we consider strong and sufficient to manage the risks posed by advanced AI systems. This represents comprehensive implementation across all four dimensions of our framework. Companies achieving this score demonstrate mature, well-documented processes with robust oversight and accountability mechanisms. For a detailed explanation of each component and its importance, please refer to our risk management framework.

Why do you only look at frontier safety frameworks?

First, while some companies do include relevant risk management information in their research publications or model cards, focusing exclusively on frontier safety frameworks enables a systematic, apples-to-apples comparison across all companies. Companies vary significantly in their publication volume, and publications serve different purposes, which would create assessment imbalances if we looked at all documents.

Second, frontier safety frameworks represent formal, ongoing commitments to risk management practices, whereas research papers and model cards often demonstrate point-in-time implementations. By concentrating on these official frameworks, we ensure our ratings reflect companies' sustained approaches to AI safety rather than isolated examples of good practice.