Meta

Weak 1.1/5
very weak
weak
moderate
substantial
strong
Risk Identification
Learn more
Risk Identification
23%
Risk Analysis and Evaluation
Learn more
Risk Analysis and Evaluation
30%
Risk Treatment
Learn more
Risk Treatment
20%
Risk Governance
Learn more
Risk Governance
15%
Best in class

SEE FRAMEWORK

  • Their risk analysis and evaluation section is overall scored best in class; for instance, they are the only company to monitor risk levels external to the models’ capabilities. 
  • There is a clear commitment to put development on hold until sufficient controls are implemented to meet critical thresholds. There is a clear process for this determination. They are the only company to do both; this is highly commendable.
  • Meta’s approach to risk modeling is best in class. They have a structured process to work with experts to conduct risk modeling as a way to identify risks and inform risk thresholds.
  • Meta also show a best-in-class awareness that frontier AI may introduce novel harms, which can not be pre-empted. They are willing to incorporate “entirely novel risk domains” into their risk modeling, informed by events external to model capabilities such as changes in the threat landscape.  
Overview
Highlights relative to others

Risk modelling is clearly motivated by a risk tolerance.

Clearer links between risk thresholds and containment thresholds.

Risk thresholds are more clearly quantitatively defined.

Weaknesses relative to others

Weaker risk culture, lack of central risk function and strong tone from the top.

Bottom three companies for risk governance, by our ratings.

No mention of loss of control risks or assurance processes.

Lacking third-party involvement across all activities.

1.1 Classification of Applicable Known Risks (40%) 18%

1.1.1 Risks from literature and taxonomies are well covered (50%) 25%

The framework covers cybersecurity, chemical and biological risks. There is no reference to obtaining risks from the literature, or justification for why they selected these domains. To improve, risk domains should include all those listed in 1.1.1, and reference documents that informed their risk selection.

They do not include other risks such as nuclear and radiological risks, persuasion, loss of control risks, and AI R&D, and 1.1.2 is less than 50%.

Quotes:

“This sub-section outlines the catastrophic outcomes that are in scope of our Framework. We include catastrophic outcomes in the following risk domains: Cybersecurity and Chemical & Biological risks. It is important to reiterate that these catastrophic outcomes do not reflect current capabilities of our models, but are included based on our threat modelling.” (p. 14)

1.1.2 Exclusions are clearly justified and documented (50%) 10%

There is no justification for why they have not included some risks, such as AI R&D, radiological and nuclear risks, persuasion, and loss of control risks. This is particularly notable given their criteria for including risks is very similar to OpenAI’s, who do include AI R&D as a tracked risk category.

Implicitly, their criteria for inclusion (plausible, catastrophic, net new and instantaneous or irremediable) gives justification for when risks are not included. However, a more explicit link between risks that are excluded and which criteria they fail is needed.

Quotes:

“For this Framework specifically, we seek to consider risks that satisfy all four criteria:

Plausible: It must be possible to identify a causal pathway for the catastrophic outcome, and to define one or more simulatable threat scenarios along that pathway. This ensures an implementable, evidence-led approach.

Catastrophic: The outcome would have large scale, devastating, and potentially irreversible harmful effects.

Net new: The outcome cannot currently be realized as described (e.g. at that scale/by that threat actor/for that cost) with existing tools and resources.)

Instantaneous or irremediable: The outcome is such that once realized, its catastrophic impacts are immediately felt, or inevitable due to a lack of feasible measures to remediate.” (p. 12)

1.2 Identification of Unknown Risks (Open-ended red teaming) (20%) 0%

1.2.1 Internal open-ended red teaming (70%) 0%

The framework doesn’t mention any procedures pre-deployment to identify novel risk domains or risk models for the frontier model. To improve, they should commit to such a process to identify either novel risk domains, or novel risk models/changed risk profiles within pre-specified risk domains (e.g. emergence of an extended context length allowing improved zero shot learning changes the risk profile), and provide methodology, resources and required expertise.

Quotes:

No relevant quotes found.

1.2.2 Third party open-ended red teaming (30%) 0%

The framework doesn’t mention any third-party procedures pre-deployment to identify novel risk domains or risk models for the frontier model. To improve, they should commit to an external process to identify either novel risk domains, or novel risk models/changed risk profiles within pre-specified risk domains (e.g. emergence of an extended context length allowing improved zero shot learning changes the risk profile), and provide methodology, resources and required expertise.

Quotes:

No relevant quotes found.

1.3 Risk modeling (40%) 41%

1.3.1 The company uses risk models for all the risk domains identified
and the risk models are published (with potentially dangerous
information redacted) (40%) 50%

Risk modelling is clearly conducted for each risk domain. The list of threat scenarios are published for each risk domain, whilst keeping generality for security reasons. There is a clear reliance on risk modelling for determining “whether this model may pose novel risks”.

To improve, more detail should be published on the risk models, including causal pathways (with sensitive information redacted.) This is to show evidence of risk modeling and to allow scrutiny from experts. Detail on the methodology and experts involved should also be published. They should also publish risk models which were not prioritized (i.e, the broader set before prioritization).

Quotes:

“Our Framework is structured around a set of catastrophic outcomes. We have used threat modelling to develop threat scenarios pertaining to each of our catastrophic outcomes. We have identified the key capabilities that would enable the threat actor to realize a threat scenario. We have taken into account both state and non-state actors, and our threat scenarios distinguish between high- or low-skill actors.” (p. 4)

“If we expect that a model may significantly exceed current frontier capabilities, we will conduct an ex-ante threat modelling exercise to help us determine whether this model may pose novel risks […]

In addition to our AI risk assessment (see below), which covers known potential risks, we conduct periodic threat modelling exercises as a proactive measure to anticipate catastrophic risks from our frontier AI. In the event that we identify that a model can enable the end-to-end execution of a threat scenario for a catastrophic outcome, we will conduct a threat modelling exercise in line with the processes in Section 3.2.

The exact format of these exercises may vary. The general process is as follows:

  1. Host workshops with experts, including external subject matter experts where relevant, to identify new catastrophic outcomes and/or threat scenarios.
  2. If new catastrophic outcomes and/or threat scenarios are identified, design new assessments to test for them, in consultation with external experts where relevant.” (pp. 6-7)

“For each catastrophic outcome, we include a description of one or more threat scenarios. See Section 3.2 for more information on how we have developed our threat scenarios. We are not providing full details of the constituent steps and tasks within a threat scenario, or the enabling capabilities required to achieve it as we want to better understand how to balance transparency and security in this regard.” (p. 14)

Coupled with each outcome (risk tolerance) is a threat scenario, describing the steps involved for this outcome to be realized.
For instance, for the outcome “Cyber 1: Automated end-to-end compromise of a best-practice protected corporate-scale environment (ex. Fully patched, MFA-protected)”, the threat scenario is “TS.1.1: End-to-End compromise of a fully patched environment protected by state of the art security best practices. Complete end to end automation of cyber operations to achieve a goal like ransoming or comprehensive theft of a company’s critical IP using a chain of techniques- such as network infiltration, sensitive data discovery, exfiltration, privilege escalation, and lateral movement – for significantly less than cost of services on black market and/or in a short amount of time.” (p. 14) More examples can be found on pp. 14-15.

1.3.2 Risk modeling methodology (40%) 39%

1.3.2.1 Methodology precisely defined (70%) 50%

The methodology for the overall threat modeling process is defined. To improve, more detail is required; eg. whilst they mention that they “map the potential causal pathways that could produce [catastrophic outcomes]”, Meta could provide greater granularity by identifying the individual steps of each pathway to the threat scenario more precisely, using techniques such as event trees or fault trees or how they elicit information from experts to inform their risk models.

Quotes:

“We start by identifying a set of catastrophic outcomes we must strive to prevent, and then map the potential causal pathways that could produce them. When developing these outcomes, we’ve considered the ways in which various actors, including state level actors, might use/misuse frontier AI. We describe threat scenarios that would be potentially sufficient to realize the catastrophic outcome, and we define our risk thresholds based on the extent to which a frontier AI would uniquely enable execution of any of our threat scenarios.” (p. 10)

“We design assessments to simulate whether our model would uniquely enable these scenarios, and identify the enabling capabilities the model would need to exhibit to do so. Our first set of evaluations are designed to identify whether all of these enabling capabilities are present, and if the model is sufficiently performant on them. If so, this would prompt further evaluation to understand whether the model could uniquely enable the threat scenario […] It is important to note that the pathway to realize a catastrophic outcome is often extremely complex, involving numerous external elements beyond the frontier AI model. Our threat scenarios describe an essential part of the end-to-end pathway. By testing whether our model can uniquely enable a threat scenario, we’re testing whether it uniquely enables that essential part of the pathway. If it does not, then we know that our model cannot be used to realize the catastrophic outcome, because this essential part is still a barrier. If it does and cannot be further mitigated, we assign the
model to the critical threshold.

This would also trigger a new threat modelling exercise to develop additional threat scenarios along the causal pathway so that we can ascertain whether the catastrophic outcome is indeed realizable, or whether there are still barriers to realizing the catastrophic outcome.” (p. 11)

“Threat modelling is a structured process of identifying how different threat actors could leverage frontier AI to produce specific – and in this instance catastrophic – outcomes. This process identifies the potential causal pathways for realizing the catastrophic outcome.

Threat scenarios describe how different threat actors might achieve a catastrophic outcome. Threat scenarios may be described in terms of the tasks a threat actor would use a frontier AI model to complete, the particular capabilities they would exploit, or the tools they might use in conjunction to realize the catastrophic outcome.” (p. 20)

1.3.2.2 Mechanism to incorporate red teaming findings (15%) 0%

No mention of risks identified during open-ended red teaming or evaluations triggering further risk modeling.

Quotes:

No relevant quotes found.

1.3.2.3 Prioritization of severe and probable risks (15%) 25%

There is an explicit intent to prioritize “the most urgent catastrophic outcomes” amongst all the identified causal pathways (ie risk models). For a risk to be monitored, they also require that the risk pathway deriving from the model is plausible and catastrophic; the latter criterion prioritizes severity, whilst the former prioritizes nonzero probability. It is commendable that this prioritization occurs from the full space of risk models, rather than from prespecified risk domains.

However, importantly, the list of identified scenarios, plus justification for why their chosen risk models are most severe or probable plus the severity and probability scores of deprioritised risk models, is not detailed. To improve, they could reference their work done in risk modelling in the framework, such as (Wan et al., 2024)

Quotes:

“We start by identifying a set of catastrophic outcomes we must strive to prevent, and then map the potential causal pathways that could produce them. When developing these outcomes, we’ve considered the ways in which various actors, including state level actors, might use/misuse frontier AI. We describe threat scenarios that would be potentially sufficient to realize the catastrophic outcome, and we define our risk thresholds based on the extent to which a frontier AI would uniquely enable execution of any of our threat scenarios.

[…]
An outcomes-led approach also enables prioritization. This systematic approach will allow us to identify the most urgent catastrophic outcomes – i.e., within the domains of cybersecurity and chemical and biological weapons – and focus our efforts on avoiding these outcomes rather than spreading efforts across a wide range of theoretical risks from particular capabilities that may not plausibly be presented by the technology we are actually building.” (p. 10)

“For this Framework specifically, we seek to consider risks that satisfy all four criteria:

Plausible: It must be possible to identify a causal pathway for the catastrophic outcome,
 and to define one or more simulatable threat scenarios along that pathway.

Catastrophic: The outcome would have large scale, devastating, and potentially irreversible harmful effects.

Net new: The outcome cannot currently be realized as described (e.g. at that scale /
 by that threat actor / for that cost) with existing tools and resources.

Instantaneous or irremediable: The outcome is such that once realized, its catastrophic impacts are immediately felt, or inevitable due to a lack of feasible measures to remediate.” (p. 12)

1.3.3 Third party validation of risk models (20%) 25%

External experts are engaged when developing risk models. External experts are also involved in “threat modelling exercises” which “explore, in a systematic way, how frontier AI models might be used to produce catastrophic outcomes.” This does not constitute validation, however – to improve, external experts should review final threat models. Nonetheless, the effort to ensure that third party expert opinion is present in the risk modelling process is commendable.

Quotes:

“In the event that we identify that a model can enable the end-to-end execution of a threat scenario for a catastrophic outcome, we will conduct a threat modelling exercise in line with the processes in Section 3.2.

The exact format of these exercises may vary. The general process is as follows:

  1. Host workshops with experts, including external subject matter experts where relevant, to identify new catastrophic outcomes and/or threat scenarios.” (pp. 6-7)

“Threat modelling is fundamental to our outcomes-led approach. We run threat modelling exercises both internally and with external experts with relevant domain expertise, where required. The goal of these exercises is to explore, in a systematic way, how frontier AI models might be used to produce catastrophic outcomes. Through this process, we develop threat scenarios which describe how different actors might use a frontier AI model to realize a catastrophic outcome.” (p. 10)

“Our threat modelling is informed by our own internal experts’ assessment of the catastrophic risks that frontier models might pose, as well as engagements with governments, external experts, and the wider AI community. However, there remains quite considerable divergence in expert opinion as to how AI capabilities will develop and the time horizons on which they could emerge.” (p. 11)

Back to top

2.1 Setting a Risk Tolerance (35%) 22%

2.1.1 Risk tolerance is defined (80%) 28%

2.1.1.1 Risk tolerance is at least qualitatively defined for all risks (33%) 75%

For each risk domain, they outline the “catastrophic outcomes we must strive to prevent” in detail. Implicitly, this is a risk tolerance. For instance, “Cyber 3: Widespread economic damage to individuals or corporations via scaled long form fraud and scams.” More detail could be given, e.g. on what constitutes “widespread economic damage” and to how many individuals/corporations.

They also more abstractly set out their risk tolerance, though do not call it explicitly a risk tolerance. For instance, they do not release if “the model provides significant uplift towards execution of a threat scenario (i.e. significantly enhances performance on key capabilities or tasks needed to produce a catastrophic outcome) but does not enable execution of any threat scenario that has been identified as potentially sufficient to produce a catastrophic outcome.” This means their implicit risk tolerance is the risk level associated with this scenario.

To improve, they should set out the risk tolerance for each risk domain in terms of probability and severity, and separate from KRIs. Defining risk tolerance in terms of tangible harm would be more comprehensible to external stakeholders such as policymakers. For example, this could be expressed as economic damages for cybersecurity risks and as number of fatalities for chemical and biological risks.

Quotes:

“We start by identifying a set of catastrophic outcomes we must strive to prevent, and then map the potential causal pathways that could produce them.” (p. 10)

They describe each of the outcomes they are wanting to prevent:
“Cyber 1: Automated end-to-end compromise of a best-practice protected corporate-scale environment (ex. Fully patched, MFA-protected)

Cyber 2: Automated discovery and reliable exploitation of critical zero-day vulnerabilities in current popular, security best- practices software before defenders can find and patch
them.

Cyber 3: Widespread economic damage to individuals or corporations via scaled long form fraud and scams.

CB 1: Proliferation of known medium-impact biological and chemical weapons for low and moderate skill actors.

CB 2: Proliferation of high-impact biological weapons, with capabilities equivalent to known agents, for high-skilled actors.

CB 3: Development of high-impact biological weapons with novel capabilities for high-skilled actors.” (pp. 14-15)

2.1.1.2 Risk tolerance is expressed at least partly quantitatively as a combination of scenarios (qualitative) and probabilities (quantitative) for all risks (33%) 10%

The risk tolerance, implicit or otherwise, is not expressed fully or partly quantitatively. To improve, the risk tolerance should be expressed fully quantitatively or as a combination of scenarios with probabilities.

Nonetheless, they mention an intent to quantify risks and benefits; this shows an acknowledgment of quantifying risks, including the risk tolerance. Partial credit is given here.

Quotes:

“We hope that sharing our current approach to development of advanced AI systems will not only promote transparency into our decision-making processes but also encourage discussion and research on how to improve the science of AI evaluation and the quantification of risks and benefits.” (p. 2)

2.1.1.3 Risk tolerance is expressed fully quantitatively as a product of severity (quantitative) and probability (quantitative) for all risks (33%) 0%

Whilst they mention an intent to quantify risks (and benefits), there is no risk tolerance defined quantitatively using severity and probability.

Quotes:

“We hope that sharing our current approach to development of advanced AI systems will not only promote transparency into our decision-making processes but also encourage discussion and research on how to improve the science of AI evaluation and the quantification of risks and benefits.” (p. 2)

2.1.2 Process to define the tolerance (20%) 0%

2.1.2.1 AI developers engage in public consultations or seek guidance from regulators where available (50%) 0%

No evidence of engaging in public consultations or seeking guidance from regulators for risk tolerance.

Quotes:

No relevant quotes found.

2.1.2.2 Any significant deviations from risk tolerance norms established in other industries is justified and documented (e.g., cost-benefit analyses) (50%) 0%

No evidence of considering whether their approach aligns with or deviates from established norms.

Quotes:

No relevant quotes found.

2.2 Operationalizing Risk Tolerance (65%) 34%

2.2.1 Key Risk Indicators (KRI) (30%) 33%

2.2.1.1 KRI thresholds are at least qualitatively defined for all risks (45%) 50%

They give “example enabling capabilities”, but not the actual KRIs used. To improve, they should commit to actually use these KRIs in their risk management framework, or otherwise detail what KRIs will be used. However, the KRIs used are clear and measurable, and map to actual evaluation results, and appear grounded in risk modeling.

Quotes:

Under “Example Enabling Capabilities”, there are instances of KRIs for each outcome-threat scenario pair. For instance, for “Cyber 1: Automated end-to-end compromise of a best-practice protected corporate-scale environment (ex. Fully patched, MFA-protected)”, they give the KRI “Autonomous cyber operations: Ability to reliably and successfully complete complex CTF challenges at the level of a professional cyber expert.” (p. 14), or for “CB 1: Proliferation of known medium-impact biological and chemical weapons for low and moderate skill actors”, they give “Graduate level knowledge in biology, biochemistry, and chemistry; PhD level proficiency in the relevant sub-specialty for the threat in question; Summarization of scientific and technical information in a way that’s accessible to a non-expert audience” (p. 15).

2.2.1.2 KRI thresholds are quantitatively defined for all risks (45%) 0%

They explicitly do not define quantitative thresholds, though their KRIs are likely able to be quantified, e.g. “Cyber 2: Automated discovery
and reliable exploitation of critical zero-day vulnerabilities in current popular, security best-practices software before defenders can find and patch them.” or “CB 2: Proliferation of highimpact biological weapons, with capabilities equivalent to known agents, for high-skilled actors.”

Whilst it may not be possible to define a “fixed set of quantitative metrics” that would always be sufficient risk indicators, they should still publish the actual evaluations and actual thresholds which they currently operate under. Their threshold may well be a conservative estimate, until improved risk indicators can be developed. This is because KRI-KCI pairings should be as predictable in advance as possible.

Quotes:

“With current evaluations, it is not possible to define a fixed set of quantitative metrics that would indicate sufficient performance across enabling capabilities. We make this assessment [of whether models have crossed capability thresholds] through a process of expert deliberation and analysis of the evidence through our AI governance process.” (p. 16, footnote 8)

 

2.2.1.3 KRIs also identify and monitor changes in the level of risk in the external environment (10%) 25%

They note that “we may take into account monetary costs as well as a threat actor’s ability to overcome other barriers to misuse relevant to our threat scenarios such as access to compute, restricted materials, or lab facilities” when determining risk. Whilst this is is not quite a risk indicator based on the external environment (i.e., they do not give a threshold that triggers KCIs), it does mean that the KRI does not only factor in model capabilities.

Quotes:

“We may take into account monetary costs as well as a threat actor’s ability to overcome other barriers to misuse relevant to our threat scenarios such as access to compute, restricted materials, or lab facilities. If the results of our evaluations indicate that a frontier AI has a “high” risk threshold by providing significant uplift towards realization of a threat scenario we will not release the frontier AI externally.” (p. 17) and footnote 9, page 17 after “facilities”: “We recognize that as costs for training and adaptation reduce, financial constraints may become less of a barrier to misuse of AI. We will account for changing economic models as necessary.”

2.2.2 Key Control Indicators (KCI) (30%) 15%

2.2.2.1 Containment KCIs (35%) 38%
2.2.2.1.1 All KRI thresholds have corresponding qualitative containment KCI thresholds (50%) 75%

The KRI thresholds High and Critical have clear qualitative containment KCI thresholds. More detail should be provided in the ‘Moderate’ threshold: “Moderate. Security measures will depend on the release strategy.”

Quotes:

“Critical. Access is strictly limited to a small number of experts, alongside security protections to prevent hacking or exfiltration insofar as is technically feasible and commercially practicable.

High. Access is limited to a core research team, alongside security protections to prevent hacking or exfiltration.

Moderate. Security measures will depend on the release strategy.” (p. 13)

2.2.2.1.2 All KRI thresholds have corresponding quantitative containment KCI thresholds (50%) 0%

The containment KCI thresholds are not quantitatively defined.

Quotes:

No relevant quotes found.

2.2.2.2 Deployment KCIs (35%) 5%
2.2.2.2.1 All KRI thresholds have corresponding qualitative deployment KCI thresholds (50%) 10%

Whilst there are qualitative deployment thresholds, they are vague, referring only to reducing risk to “moderate levels”, without defining what counts as moderate. This could be referring to the Moderate deployment level, but there the KCI threshold is only “Mitigations will depend on the result of evaluations and the release strategy.” The purpose of a deployment KCI is to describe what “moderate levels” or “adequate mitigations” actually are; more detail is required.

Quotes:

“Critical. Successful execution of a threat scenario does not necessarily mean that the catastrophic outcome is realizable. If a model appears ot uniquely enabel the execution of a threat scenario we will pause development while we investigate whether barriers to realizing the catastrophic outcome remain.

Our process is as follows:
a. Implement mitigations to reduce risk to moderate levels, to the extent possible.
[…]
d. If additional barriers do not exist, continue to investigate mitigations, and do not further develop the model until such a time as adequate mitigations have been identified.” (p. 13)

“High. Implement mitigations to reduce risk to moderate levels.” (p. 13)

“Moderate. Mitigations will depend on the result of evaluations and the release strategy.” (p. 13)

2.2.2.2.2 All KRI thresholds have corresponding quantitative deployment KCI thresholds (50%) 0%

There are no quantitative deployment KCI thresholds given.

Quotes:

No relevant quotes found.

2.2.2.3 For advanced KRIs, assurance process KCIs are defined (30%) 0%

There are no assurance processes KCIs defined. The framework does not provide recognition of there being KCIs outside of containment and deployment measures.

Quotes:

No relevant quotes found.

2.2.3 Pairs of thresholds are grounded in risk modeling to show that risks remain below the tolerance (20%) 25%

There is a pairing of KRIs and KCIs, though the way these relate to the risk tolerance is not explicitly detailed. They state that they focus on determining whether residual risk is sufficiently low, given the results of evaluations and the mitigations implemented – partial credit is given for this. However, they have not shown ex ante that the KCI thresholds are sufficiently high to mitigate risk.

Quotes:

“Assess residual risk: We assess residual risk, taking into consideration the details of the risk assessment, the results of evaluations conducted throughout training, and the mitigations that have been implemented.” (p. 8)

“We define our risk thresholds based on the extent to which a frontier AI would uniquely enable execution of any of our threat scenarios. A frontier AI is assigned to the critical risk threshold if we assess that it would uniquely enable execution of a threat scenario. If a frontier AI is assessed to have reached the critical risk threshold and cannot be mitigated, we will stop development and implement the measures outlined in Table 1. Our high and moderate risk thresholds are defined in terms of the level of uplift a frontier AI provides towards realizing a threat scenario. We will develop these models in line with the processes outlined in this Framework, and implement the
measures outlined in Table 1.

Our outcomes-led approach allows us to avoid over-ascribing risk based on the presence of a particular capability alone, and instead assesses the potential for the frontier AI to actually enable harm. This approach is designed to effectively anticipate and mitigate catastrophic risk from frontier AI without unduly hindering innovation of models that do not pose catastrophic risks and can yield enormous benefits. For frontier AI that falls below the critical threshold, we will take into account both potential risks and benefits when determining how to develop and release these models. Section 4.4 explains this in more detail.” (p. 12)

2.2.4 Policy to put development on hold if the required KCI threshold cannot be achieved, until sufficient controls are implemented to meet the threshold (20%) 75%

There is a clear commitment to put development on hold until sufficient controls are implemented to meet the critical threshold. There is a clear process for this determination. An improvement would be to provide more detail on how development is stopped, and the containment measures for this; this is to ensure that the risk level does not exceed the risk tolerance at any point. Further, conditions and process of dedeployment should be given.

Quotes:

“If a frontier AI is assessed
to have reached the critical risk threshold and cannot be mitigated, we will stop development and implement the measures outlined in Table 1.” (pp. 4, 12)

“Successful execution of a threat scenario does not necessarily mean that the catastrophic outcome is realizable. If a model appears to uniquely enable the execution of a threat scenario we will pause development while we investigate whether barriers to realizing the catastrophic outcome remain.

Our process is as follows:

a. Implement mitigations to reduce risk to mdoerate levels, to the extent possible
b. Conduct a threat modelling exercise to determine whether other barriers to realizing the catastrophic outcome exist
c. If additional barriers exist, update our Framework with the new threat scenarios, and re-run our assessments to assign the model to the appropriate risk threshold
d. If additional barriers do not exist, continue to investigate mitigations, and do not further develop the model until such a time as adequate mitigations have been identified.” (p. 13)

Back to top

3.1 Implementing Mitigation Measures (50%) 15%

3.1.1 Containment measures (35%) 10%

3.1.1.1 Containment measures are precisely defined for all KCI thresholds (60%) 10%

They specify that “Access is strictly limited to a small number of experts, alongside security protections to prevent hacking or exfiltration insofar as is technically feasible and commercially practicable” for critical capability thresholds; “Access is limited to a core research team, alongside security protections to prevent hacking or exfiltration” for high capability thresholds; and “Security measures will depend on the release strategy” for moderate capability thresholds. These remain high level and require more detail; for instance, measures should be described for how access will remain limited, and what the security protections include.

Quotes:

“Access is strictly limited to a small number of experts, alongside security protections to prevent hacking or exfiltration insofar as is technically feasible and commercially practicable” for critical capability thresholds (p. 13)

“Access is limited to a core research team, alongside security protections to prevent hacking or exfiltration” for high capability thresholds (p. 13)

“Security measures will depend on the release strategy” for moderate capability thresholds (p. 13)

3.1.1.2 Proof that containment measures are sufficient to meet the thresholds (40%) 10%

After mitigations have been implemented, they “assess residual risk”, giving a process for soliciting proof in general that the residual risk is below the risk tolerance. However, they do not specifically garner proof that containment measures are sufficient to meet the relevant KCI threshold, and do not provide proof ex ante for why they believe their containment measures to be sufficient. This would be required to satisfy the criterion, and moreover may make their currently general assessment more accurate if it became more specific.

Quotes:

“Assess residual risk: We assess residual risk, taking into consideration the details of the risk assessment, the results of evaluations conducted throughout training, and the mitigations thathave been implemented.” (p. 8)

“Models that are not being considered for external release will undergo evaluation toassess the robustness of the mitigations we have implemented, which might includeadversarial prompting, jailbreak attempts, and red teaming, amongst other techniques. This evaluation also will take into account the narrower availability of those models and the security measures in place to prevent unauthorized access.” (p. 17)

3.1.1.3 Strong third party verification process to verify that the containment measures meet the threshold (100% if 3.1.1.3 > [60% x 3.1.1.1 + 40% x 3.1.1.2]) 0%

There is no mention of third-party verification that containment measures meet the threshold.

Quotes:

No relevant quotes found.

3.1.2 Deployment measures (35%) 25%

3.1.2.1 Deployment measures are precisely defined for all KCI thresholds (60%) 25%

Whilst they define deployment measures in general, such as misuse filtering, fine-tuning etc., these are not tied to the KCI thresholds: for all three capability thresholds, they state that they will “Implement mitigations to
reduce risk to moderate levels” – hence, it can be assumed the measures are not specific to certain KCI thresholds.

The measures described could also use more detail, e.g. “fine-tuning” alone does not give one a good picture of what the mitigation involves; to improve, the framework should describe what they will fine-tune for, and with how much compute, for instance.

Quotes:

“Models that are not being considered for external release will undergo evaluation to assess the robustness of the mitigations we have implemented, which might include adversarial prompting, jailbreak attempts, and red teaming, amongst other techniques. This evaluation also will take into account the narrower availability of those models and the security measures in place to prevent unauthorized access.” (p. 17)

“Evaluation results also guide the mitigations and controls we implement. The full mitigation strategy will be informed by the risk assessment, the frontier AI’s particular capabilities, and the release plans. Examples of mitigation techniques we implement include:
– Fine-tuning
– Misuse filtering, response protocols
– Sanctions screening and geogating
– Staged release to prepare the external ecosystem” (p. 18)

3.1.2.2 Proof that deployment measures are sufficient to meet the thresholds (40%) 25%

A process for providing proof is defined, though only for models not being considered for external release. Proof is not provided ex ante for why they believe their deployment measures to be sufficient. Further, they should detail the difference in burden of proof for deployment measurs to be sufficient between models that are and aren’t considered for external release.

Quotes:

“Models that are not being considered for external release will undergo evaluation to assess the robustness of the mitigations we have implemented, which might include adversarial prompting, jailbreak attempts, and red teaming, amongst other techniques. This evaluation also will take into account the narrower availability of those models and the security measures in place to prevent unauthorized access.” (p. 17)

3.1.2.3 Strong third party verification process to verify that the deployment measures meet the threshold (100% if 3.1.2.3 > [60% x 3.1.2.1 + 40% x 3.1.2.2]) 0%

There is no mention of third-party verification of deployment measures meeting the threshold.

Quotes:

No relevant quotes found.

3.1.3 Assurance processes (30%) 8%

3.1.3.1 Credible plans towards the development of assurance properties (40%) 10%

Whilst there is a commitment to conduct further research in evaluations, mitigations and monitoring, there isn’t a commitment or mention of developing assurance processes.

Quotes:

“As discussed above, we recognize that more research should be done – both within Meta and in the broader ecosystem – around how to measure and manage risk effectively in the development of frontier AI models. To that end, we’ll continue to work on: (1) improving the quality and reliability of evaluations; (2) developing additional, robust mitigation techniques; and (3) more advanced methods for performing post-release monitoring of open source AI models.” (p. 19)

3.1.3.2 Evidence that the assurance properties are enough to achieve their corresponding KCI thresholds (40%) 0%

There is no mention of providing evidence that the assurance processes are sufficient.

Quotes:

No relevant quotes found.

3.1.3.3 The underlying assumptions that are essential for their effective implementation and success are clearly outlined (20%) 25%

There is an implicit acknowledgment that capability evaluations currently assume deception is not taking place: capabilities like deception “might undermine reliability of [evaluation] results”. However, they do not provide similar assumptions for assurance processes, i.e. mitigations. To improve, the framework should detail the key technical assumptions necessary for the assurance processes to meet the KCI threshold, and evidence for why these assumptions are justified.

Quotes:

“Improving the robustness and reliability of
evaluations is an area of focus for us, and this includes working to ensure that our testing environments produce results that accurately reflect how the model will perform once in production. This includes accounting for capabilities that might undermine reliability of results, such as deception. Ensuring a robust evaluation environment is therefore an essential step in reliably evaluating and risk assessing frontier AI.” (p. 16)

3.2 Continuous Monitoring and Comparing Results with Pre-determined Thresholds (50%) 26%

3.2.1 Monitoring of KRIs (40%) 20%

3.2.1.1 Justification that elicitation methods used during the evaluations are comprehensive enough to match the elicitation efforts of potential threat actors (30%) 50%

There is a description of elicitation methods being designed to match the elicitation efforts of potential threat actors, though more detail could be provided to justify that these are comprehensive enough. More detail could be added on which elicitation methods they anticipate would be used by different threat actors, under realistic settings, to justify their elicitation method (with sensitive information redacted), and a listing of the elicitation methods used in evaluations.

Quotes:

“Our evaluations are designed to account for the deployment context of the model. This includes assessing whether risks will remain within defined thresholds once a model is deployed or released using the target release approach. For example, to help ensure that we are appropriately assessing the risk, we prepare the asset – the version of the model that we will test – in a way that seeks to account for the tools and scaffolding in the current ecosystem that a particular threat actor might seek to leverage to enhance the model’s capabilities. We also account for enabling capabilities, such as automated AI R&D, that might increase the potential for enhancements to model capabilities.” (p. 17)

3.2.1.2 Evaluation frequency (25%) 0%

There is no specification of evaluation frequency in terms of the relative variation of effective computing power used in training or fixed time periods.

Quotes:

“We typically repeat evaluations as a frontier AI nears or completes training.” (p. 18)

“We track the latest technical developments in frontier AI capabilities and evaluation, including through engagement with peer companies and the wider AI community of academics, policymakers, civil society organizations, and governments. We expect to update our Framework as our collective understanding of how to measure and mitigate potential catastrophic risk from frontier AI develops, including related to state actors. This might involve adding, removing, or updating catastrophic outcomes or threat scenarios, or changing the ways in which we prepare models to be evaluated. We may choose to reevaluate certain models in line with our revised Framework.” (p. 19)

3.2.1.3 Description of how post-training enhancements are factored into capability assessments (15%) 25%

There is an explicit consideration of automated AI R&D potentially leading to unanticipated post-training enhancements; this nuance is commendable. More detail could be added on how this factor is accounted for, however. Further, more detail could be added on how they account(ed) for how post-training enhancements’ risk profiles change with different model structures – namely, post-training enhancements are much more scalable with reasoning models, as inference compute can often be scaled to improve capabilities.

Quotes:

“Our evaluations are designed to account for the deployment context of the model. This includes assessing whether risks will remain within defined thresholds once a model is deployed or released using the target release approach. For example, to help ensure that we are appropriately assessing the risk, we prepare the asset – the version of the model that we will test – in a way that seeks to account for the tools and scaffolding in the current ecosystem that a particular threat actor might seek to leverage to enhance the model’s capabilities. We also account for enabling capabilities, such as automated AI R&D, that might increase the potential for enhancements to model capabilities.” (p. 17)

“We track the latest technical developments in frontier AI capabilities and evaluation, including through engagement with peer companies and the wider AI community of academics, policymakers, civil society organizations, and governments.” (p. 19)

3.2.1.4 Vetting of protocols by third parties (15%) 0%

There is no mention of having the evaluation methodology vetted by third parties.

Quotes:

No relevant quotes found.

3.2.1.5 Replication of evaluations by third parties (15%) 10%

There is no mention of evaluations being replicated; they mention that external parties may be involved in red teaming, at Meta’s discretion.

Quotes:

“For both cyber and chemical and biological risks, we conduct red teaming exercises once a model achieves certain levels of performance in capabilities relevant to these domains, involving external experts when appropriate.” (p. 8)

3.2.2 Monitoring of KCIs (40%) 20%

3.2.2.1 Detailed description of evaluation methodology and justification that KCI thresholds will not be crossed unnoticed (40%) 50%

The framework acknowledges that monitoring is required to ensure KCIs remain within bounds, i.e. that mitigations are adequate. More detail could be given on how adequacy is assessed, how monitoring is conducted, and the frequency of this monitoring.

Quotes:

“As outlined in the introduction, we expect to update our Frontier AI Framework to reflect developments in both the technology and our understanding of how to manage its risks and benefits. To do so, it is necessary to observe models in their deployed context and to monitor how the AI ecosystem is evolving. These observations feed into the work of assessing the adequacy of our mitigations for deployed models, and the efficacy of our Framework. We will update our Framework based on these
observations.” (p. 19)

3.2.2.2 Vetting of protocols by third parties (30%) 0%

There is no mention of KCIs protocols being vetted by third parties.

Quotes:

No relevant quotes found.

3.2.2.3 Replication of evaluations by third parties (30%) 0%

There is no mention of control evaluations/mitigation testing being replicated or conducted by third-parties.

Quotes:

No relevant quotes found.

3.2.3 Transparency of evaluation results (10%) 21%

3.2.3.1 Sharing of evaluation results with relevant stakeholders as appropriate (85%) 25%

There are commitments to share evaluation results, assumedly to the public, though they qualify this with “plan to continue” rather than a clear commitment. They do not commit to sharing all the KRI and KCI evaluation results for every model, only “relevant information about how we develop and evaluate our models responsibly”. They do not commit to alerting any stakeholders, such as relevant authorities, when/if Critical capabilities are reached.

Quotes:

“In line with the processes set out in this Framework, we intend to continue to openly release models to the ecosystem. We also plan to continue sharing relevant information about how we develop and evaluate our models responsibly, including through artefacts like model cards and research papers, and by providing guidance to model deployers through resources like our Responsible Use Guides.” (p. 9)

3.2.3.2 Commitment to non-interference with findings (15%) 0%

No commitment to permitting the reports, which detail the results of external evaluations (i.e. any KRI or KCI assessments conducted by third parties), to be written independently and without interference or suppression.

Quotes:

No relevant quotes found.

3.2.4 Monitoring for novel risks (10%) 75%

3.2.4.1 Identifying novel risks post-deployment: engages in some process (post deployment) explicitly for identifying novel risk domains or novel risk models within known risk domains (50%) 75%

There is an explicit process to “identify new catastrophic outcomes and/or threat scenarios”, using “workshops with experts, including subject matter experts where relevant”. Further, “we conduct periodic threat modelling exercises as a proactive measure to anticipate catastrophic risks from our frontier AI”. They also describe a monitoring setup, which could be built upon to also identify novel risks post-deployment. To improve, more detail on the expertise required for the workshop, or how often threat modelling exercises are performed, could be added.

Quotes:

“In addition to our AI risk assessment (see below), which covers known potential risks, we conduct periodic threat modelling exercises as a proactive measure to anticipate catastrophic risks from our frontier AI. In the event that we identify that a model can enable the end-to-end execution of a threat scenario for a catastrophic outcome, we will conduct a threat modelling exercise in line with the processes in Section 3.2.

The exact format of these exercises may vary. The general process is as follows

  1. Host workshops with experts, including external subject matter experts where relevant, to identify new catastrophic outcomes and/or threat scenarios.
  2. If new catastrophic outcomes and/or threat scenarios are identified, design new assessments to test for them, in consultation with external experts where relevant.” (pp. 6–7)

“As outlined in the introduction, we expect to update our Frontier AI Framework to reflect developments in both the technology and our understanding of how to manage its risks and benefits. To do so, it is necessary to observe models in their deployed context and to monitor how the AI ecosystem is evolving. These observations feed into the work of assessing the adequacy of our mitigations for deployed models, and
the efficacy of our Framework. We will update our Framework based on these observations. ” (p. 19)

3.2.4.2 Mechanism to incorporate novel risks identified post-deployment (50%) 75%

They mention a willingness to incorporate new risks/”outcomes” in “entirely novel risk domains” to account for “the ways in which frontier AI might introduce novel harms” due to “changes to the threat landscape”. Further, they note that if novel “catastrophic outcomes and/or threat scenarios are identified”, they will “design new assessments to test for them, in consultation with external experts where relevant”. This is unique and shows nuance; to improve, the mechanism could be made more explicit, such as how it informs the interpretation of other risk models if novel risk domains are accounted.

Quotes:

“By anchoring thresholds on outcomes, we aim to create a precise and somewhat durable set of thresholds, because while capabilities will evolve as the technology develops, the outcomes we want to prevent tend to be more enduring. This is not to say that our outcomes are fixed. It is possible that as our understanding of frontier AI improves, outcomes or threat scenarios might be removed, if we can determine that they no longer meet our criteria for inclusion. We also may need to add new outcomes in the future. Those outcomes might be in entirely novel risk domains, potentially
 as a result of novel model capabilities, or they might reflect changes to the threat landscape in existing risk domains that bring new kinds of threat actors into scope. This accounts for the ways in which frontier AI might introduce novel harms, as well
 its potential to increase the risk of catastrophe in known risk domains.” (p. 10)

“In addition to our AI risk assessment (see below), which covers known potential risks, we conduct periodic threat modelling exercises as a proactive measure to anticipate catastrophic risks from our frontier AI. In the event that we identify that a model can enable the end-to-end execution of a threat scenario for a catastrophic outcome, we will conduct a threat modelling exercise in line with the processes in Section 3.2.
The exact format of these exercises may vary. The general process is as follows:

  1. Host workshops with experts, including external subject matter experts where relevant, to identify new catastrophic outcomes and/or threat scenarios.
  2. If new catastrophic outcomes and/or threat scenarios are identified, design new assessments to test for them, in consultation with external experts where relevant.” (pp. 6–7)
Back to top

4.1 Decision-making (25%) 30%

4.1.1 The company has clearly defined risk owners for every key risk identified and tracked (25%) 10%

The framework does not list designated risk owners. It references senior decision-makers’ involvement in the process, but in order to improve, it should include distinct risk owners for each risk.

Quotes:

“Findings at any stage might prompt discussions via our centralized review process, which ensures that senior decision-makers are involved throughout the lifecycle of development and release.” (p. 5)

4.1.2 The company has a dedicated risk committee at the management level that meets regularly (25%) 25%

The framework does not reference a management risk committee, but references decisions being made by a specific leadership team.

Quotes:

“Informed by this analysis, a leadership team will either request further testing or information, require additional mitigations or improvements, or they will approve the model for release.” (p. 8)

4.1.3 The company has defined protocols for how to make go/no-go decisions (25%) 75%

The framework provides detailed criteria for decision-making. It commendably outlines a comprehensive process for model development decisions through three stages: Anticipate, Evaluate & mitigate, and Decide. The framework stresses the use of residual risk in the risk assessment. It could improve further by providing more details on who makes the decisions and their timing.

Quotes:

“The residual risk assessment is reviewed by the relevant research and/or product teams, as well as a multidisciplinary team of reviewers as needed. Informed by this analysis, a leadership team will either request further testing or information, require additional mitigations or improvements, or they will approve the model for release.” (p. 8)
“Findings at any stage might prompt discussions via our centralized review process, which ensures that senior decision-makers are involved throughout the lifecycle of development and release.” (p. 5)
“If a frontier AI is assessed to have reached the critical risk threshold and cannot be mitigated, we will stop development and implement the measures outlined”. (p. 4)
“We define our risk thresholds based on the extent to which a frontier AI would uniquely enable execution of any of our threat scenarios.” (p. 10)
“While it is impossible to eliminate subjectivity, we believe that it is important to consider the benefits of the technology we develop. This helps us ensure that we are meeting our goal of delivering those benefits to our community. It also drives us to focus on approaches that adequately mitigate any significant risks that we identify without also eliminating the benefits we hoped to deliver in the first place.” (p. 18)

4.1.4 The company has defined escalation procedures in case of incidents (25%) 10%

The framework provides detailed criteria for decision-making. It commendably outlines a comprehensive process for model development decisions through three stages: Anticipate, Evaluate & mitigate, and Decide. The framework stresses the use of residual risk in the risk assessment. It could improve further by providing more details on who makes the decisions and their timing.

Quotes:

“The residual risk assessment is reviewed by the relevant research and/or product teams, as well as a multidisciplinary team of reviewers as needed. Informed by this analysis, a leadership team will either request further testing or information, require additional mitigations or improvements, or they will approve the model for release.” (p. 8)
“Findings at any stage might prompt discussions via our centralized review process, which ensures that senior decision-makers are involved throughout the lifecycle of development and release.” (p. 5)
“If a frontier AI is assessed to have reached the critical risk threshold and cannot be mitigated, we will stop development and implement the measures outlined”. (p. 4)
“We define our risk thresholds based on the extent to which a frontier AI would uniquely enable execution of any of our threat scenarios.” (p. 10)
“While it is impossible to eliminate subjectivity, we believe that it is important to consider the benefits of the technology we develop. This helps us ensure that we are meeting our goal of delivering those benefits to our community. It also drives us to focus on approaches that adequately mitigate any significant risks that we identify without also eliminating the benefits we hoped to deliver in the first place.” (p. 18)

4.2. Advisory and Challenge (20%) 21%

4.2.1 The company has an executive risk officer with sufficient resources (16.7%) 0%

No mention of an executive risk officer.

Quotes:

No relevant quotes found.

4.2.2 The company has a committee advising management on decisions involving risk (16.7%) 25%

The framework does not mention an advisory committee per se. It mentions multi-disciplinary engagement by company leaders. To improve, they should follow the best practice of having a specific committee with risk expertise that can advise management on risk decisions.

Quotes:

“The risk assessment process involves multi-disciplinary engagement, including internal and, where appropriate, external experts from various disciplines (which could include engineering, product management, compliance and privacy, legal, and policy) and company leaders from multiple disciplines.” (p. 7)

4.2.3 The company has an established system for tracking and monitoring risks (16.7%) 50%

The framework describes a fairly comprehensive system for monitoring risk indicators. To improve, they should provide more details on how indicators are analyzed and related to risk levels.

Quotes:

“Throughout development, we monitor performance against our expectations for the reference class as well as the enabling capabilities we have identified in our threat scenarios, and use these indicators as triggers for further evaluations as capabilities develop.” (p. 7)
“We track the latest technical developments in frontier AI capabilities and evaluation, including through engagement with peer companies and the wider AI community of academics, policymakers, civil society organizations, and governments.” (p. 19)

4.2.4 The company has designated people that can advise and challenge management on decisions involving risk (16.7%) 25%

The framework does not mention risk experts designated to challenge decisions. It references involvement of experts in the risk management process, but to improve, it should make use of the best practice to have management be challenged by people with risk expertise.

Quotes:

“Host workshops with experts, including external subject matter experts where relevant, to identify new catastrophic outcomes and/or threat scenarios.” (p. 7)
“The risk assessment process involves multi-disciplinary engagement, including internal and, where appropriate, external experts from various disciplines”. (p. 7)
“Our threat modelling is informed by our own internal experts’ assessment of the catastrophic risks that frontier models might pose, as well as engagements with governments, external experts, and the wider AI community.” (p. 11)

4.2.5 The company has an established system for aggregating risk data and reporting on risk to senior management and the Board (16.7%) 25%

The framework references a process through which leadership can ask for more information. This suggests that an established system might be in place for reporting. However, to improve its score, it should provide more information on what risk data is aggregated and provided to management.

Quotes:

“The residual risk assessment is reviewed by the relevant research and/or product teams, as well as a multidisciplinary team of reviewers as needed. Informed by this analysis, a leadership team will either request further testing or information, require additional mitigations or improvements, or they will approve the model for release.” (p. 8)

4.2.6 The company has an established central risk function (16.7%) 0%

No mention of a central risk function.

Quotes:

No relevant quotes found.

4.3 Audit (20%) 5%

4.3.1 The company has an internal audit function involved in AI governance (50%) 0%

No mention of an internal audit function.

Quotes:

No relevant quotes found.

4.3.2 The company involves external auditors (50%) 10%

The framework references the use of external experts, but not auditors.

Quotes:

“The risk assessment process involves multi-disciplinary engagement, including internal and, where appropriate, external experts from various disciplines (which could include engineering, product management, compliance and privacy, legal, and policy) and company leaders from multiple disciplines.” (p. 7)

4.4 Oversight (20%) 0%

4.4.1 The Board of Directors of the company has a committee that provides oversight over all decisions involving risk (50%) 0%

No mention of a Board risk committee.

Quotes:

No relevant quotes found.

4.4.2 The company has other governing bodies outside of the Board of Directors that provide oversight over decisions (50%) 0%

No mention of any additional governance bodies.

Quotes:

No relevant quotes found.

4.5 Culture (10%) 3%

4.5.1 The company has a strong tone from the top (33.3%) 10%

The framework states a commitment to responsible advancement of AI. However, to improve, it should also mention the risks that are present from the development and deployment of their models.

Quotes:

“At Meta, we believe that the best way to make the most of that opportunity is by building state-of-the-art AI, and releasing it to a global community of researchers, developers, and innovators.” (p. 2)
“We’re committed to advancing the state of the art in AI, on models themselves and on systems to deploy them responsibly, to realize that potential.” (p. 2)

4.5.2 The company has a strong risk culture (33.3%) 0%

No mention of elements of risk culture.

Quotes:

No relevant quotes found.

4.5.3 The company has a strong speak-up culture (33.3%) 0%

No mention of elements of speak-up culture.

Quotes:

No relevant quotes found.

4.6 Transparency (5%) 33%

4.6.1 The company reports externally on what their risks are (33.3%) 50%

The framework states the two risks currently in scope and states a plan to continue sharing model cards and similar. Further detail on safeguards would contribute to a higher score.

Quotes:

“We include catastrophic outcomes in the following risk domains: Cybersecurity and Chemical & Biological risks.” (p. 14)
“We also plan to continue sharing relevant information about how we develop and evaluate our models responsibly, including through artefacts like model cards and research papers, and by providing guidance to model deployers through resources like our Responsible Use Guides.” (p. 9)

4.6.2 The company reports externally on what their governance structure looks like (33.3%) 25%

The framework has a governance section and outlines a fairly clear governance process in terms of “plan; evaluate and mitigate; and decide”, but does not include sufficient detail on which governance bodies are involved, which would be needed for a higher score.

Quotes:

“As outlined in the introduction, we expect to update our Frontier AI Framework to reflect developments in both the technology and our understanding of how to manage
its risks and benefits. To do so, it is necessary to observe models in their deployed context and to monitor how the AI ecosystem is evolving. These observations feed into the work of assessing the adequacy of our mitigations for deployed models, and the efficacy of our Framework. We will update our Framework based on these observations.” (p. 19)
“This Framework builds upon the processes and expertise that have guided the responsible development and release of our research and products over the years. The processes outlined in this Framework describe our approach to developing and releasing Frontier AI specifically.” (p. 5)
“This section provides an overview of the processes we follow when developing and releasing frontier AI to ensure that we are monitoring and managing risk throughout.” (p. 5)
“Our governance approach can be split into three main stages: plan; evaluate and mitigate; and decide. Findings at any stage might prompt discussions via our centralized review process, which ensures that senior decision-makers are involved throughout the lifecycle of development and release.” (p. 5)

4.6.3 The company shares information with industry peers and government bodies (33.3%) 25%

The framework lists several ways in which the company works with external parties. however, to get a higher score, it would need to be more specific on what information would be shared with external parties and when.

Quotes:

“We track the latest technical developments in frontier AI capabilities and evaluation, including through engagement with peer companies and the wider AI community of academics, policymakers, civil society organizations, and governments.” (p. 19)
“For certain types of catastrophic risk, this will necessarily include working with government officials, who have the specific knowledge and expertise to enable proper assessment.” (p. 7, footnote)

Back to top