Magic

Very Weak 0.6/5
very weak
weak
moderate
substantial
strong
Risk Identification
Learn more
Risk Identification
12%
Risk Analysis and Evaluation
Learn more
Risk Analysis and Evaluation
14%
Risk Treatment
Learn more
Risk Treatment
8%
Risk Governance
Learn more
Risk Governance
10%
Best in class

SEE FRAMEWORK

  • Magic should be recognized especially for the vetting of its protocols by third parties, where they gain input from relevant experts on the development of “detailed dangerous capability evaluations” and seek approval from the Board of Directors for changing which benchmarks are used as KRIs, making this decision “with input from external security and AI safety advisers”.
Overview
Highlights relative to others

Quantitative risk indicators are given.

The risk indicators for their risk domains refer to various risk models.

They commit to conducting evaluations quarterly, plus a report on the implementation of their risk management framework.

Weaknesses relative to others

Lacking many key governance mechanisms, such as an internal audit, speak-up culture, management advisory group & challenge of management.

No risk modeling methodology is given.

Lacking justification for excluding some risks.

Lacking commitment to share information externally, such as with industry peers, government or other stakeholders.

1.1 Classification of Applicable Known Risks (40%) 25%

1.1.1 Risks from literature and taxonomies are well covered (50%) 50%

Risks covered include Cyberoffense, AI R&D, Autonomous Replication and Adaptation, and Biological Weapons Assistance. It is commendable that they reference the White House Executive Order on AI to inform risk identification.

They do not include chemical, nuclear or radiological risks, nor manipulation, and 1.1.2 is less than 50%. Whilst it is commendable that they break down loss of control risks to Autonomous Replication and Adaptation, more justification should be given on why this adequately covers loss of control risks.

To improve, they could also reference the wider literature to show they are engaging in systematic exploration of risks, so that risk domains highlighted by experts are not missed.

Quotes:

“Our current understanding suggests at least four threat models of concern as our AI systems become more capable: Cyberoffense, AI R&D, Autonomous Replication and Adaptation (ARA), and potentially Biological Weapons Assistance. Analogously, the White House Executive Order on AI lays out risks including “lowering the barrier to entry for the development, acquisition, and use of biological weapons by non-state actors; the discovery of software vulnerabilities and development of associated exploits; the use of software or tools to influence real or virtual events; [and] the possibility for self-replication or propagation”.”

Risk domains include: Cyberoffense, AI R&D, Autonomous Replication and Adaptation, Biological Weapons Assistance

1.1.2 Exclusions are clearly justified and documented (50%) 0%

There is no justification given for why they have excluded certain categories of risk, such as chemical, nuclear or radiological risks, and manipulation.

Quotes:

No relevant quotes found.

1.2 Identification of Unknown Risks (Open-ended red teaming) (20%) 0%

1.2.1 Internal open-ended red teaming (70%) 0%

The framework doesn’t mention any procedures pre-deployment to identify novel risk domains or risk models for the frontier model. To improve, they should commit to such a process to identify either novel risk domains, or novel risk models/changed risk profiles within pre-specified risk domains (e.g. emergence of an extended context length allowing improved zero shot learning changes the risk profile), and provide methodology, resources and required expertise.

Quotes:

No relevant quotes found.

1.2.2 Third party open-ended red teaming (30%) 0%

The framework doesn’t mention any third-party procedures pre-deployment to identify novel risk domains or risk models for the frontier model. To improve, they should commit to an external process to identify either novel risk domains, or novel risk models/changed risk profiles within pre-specified risk domains (e.g. emergence of an extended context length allowing improved zero shot learning changes the risk profile), and provide methodology, resources and required expertise.

Quotes:

No relevant quotes found.

1.3 Risk modeling (40%) 4%

1.3.1 The company uses risk models for all the risk domains identified
and the risk models are published (with potentially dangerous
information redacted) (40%) 10%

Whilst they refer to ‘threat models’ and ‘covered threat models’, these seem to refer more to singular risk models which are treated as the main risk domains, rather than being one of many risk models completed for a particular risk domain. For instance, they call “Cyberoffense” a “threat model” with corresponding Critical Capability Threshold: “The model is effective at real-world cyberoffense operations. With the model’s help, the cost of discovery of new zero-days or breaking critical infrastructure for a malicious expert actor is reduced by at least 10x; accessing classified and not previously accessible information from a major government; a talented Computer Science undergrad level malicious actor spending 3 months and $1m in compute has a substantial chance of breaking critical infrastructure.” This shows evidence of multiple threat scenarios that are measurable.

Hence, whilst they probably do engage in risk modelling by mapping out causal pathways for harm (which they call “mechanisms via which AI systems could cause a major catastrophe in the near future”), there does not seem to be a structured process for doing this risk modeling as these causal pathways are not given. Whilst they indicate that they “may add more threat models as we learn more”, it is not clear that this is distinct from risk domains.

Quotes:

“Evaluations for Covered Threat Models. We use the term threat models to refer to proposed mechanisms via which AI systems could cause a major catastrophe in the near future.

An internal team will develop and execute evaluations that can provide early warnings of whether the AI systems we’ve built increase the risk from our Covered Threat Models. This team may include technical experts, security researchers, and relevant subject matter experts.”

“We value making principled commitments that hold true over time, and that are based on the latest in model advancements and analyses of threat models, rather than speculations. Our initial commitments detail four Covered Threat Models, but we will iteratively improve these and may add more threat models as we learn more.”

1.3.2 Risk modeling methodology (40%) 0%

1.3.2.1 Methodology precisely defined (70%) 0%

There is no methodology for risk modeling defined.

Quotes:

No relevant quotes found.

1.3.2.2 Mechanism to incorporate red teaming findings (15%) 0%

No mention of risks identified during open-ended red teaming or evaluations triggering further risk modeling.

Quotes:

No relevant quotes found.

1.3.2.3 Prioritization of severe and probable risks (15%) 0%

There is no indication that the most severe/probable harms are prioritized.

Quotes:

No relevant quotes found.

1.3.3 Third party validation of risk models (20%) 0%

There is no reference to third parties validating risk models.

Quotes:

No relevant quotes found.

Back to top

2.1 Setting a Risk Tolerance (35%) 9%

2.1.1 Risk tolerance is defined (80%) 12%

2.1.1.1 Risk tolerance is at least qualitatively defined for all risks (33%) 25%

There is a proto risk tolerance in each of the critical capability thresholds. However, there should only be one risk tolerance for each risk domain, which applies no matter the risk model (e.g., whether it is a “talented Comptuer Science undergrad” or a “malicious expert actor” that leads to harm.) To improve, they must set out the maximum amount of risk the company is willing to accept, for each risk domain (though they need not differ between risk domains), ideally expressed in terms of probabilities and severity (economic damages, physical lives, etc), and separate from KRIs.

Quotes:

Cyberoffense. “Critical Capability Threshold: The model is effective at real-world cyberoffense operations. With the model’s help, the cost of discovery of new zero-days or breaking critical infrastructure for a malicious expert actor is reduced by at least 10x; accessing classified and not previously accessible information from a major government; a talented Computer Science undergrad level malicious actor spending 3 months and $1m in compute has a substantial chance of breaking critical infrastructure.”

AI R&D. “Critical Capability Threshold: The model facilitates a dramatic acceleration in the rate of effective compute scaling or can effectively replace high-level machine learning researchers. Such capabilities may enable it to contribute to the unsafe proliferation or enhancement of other dangerous AI capabilities.”

Autonomous Replication and Adaptation. “Critical Capability Threshodl: If given access to its own weights and prompted to do so, the AI model could autonomously maintain its own operation executing cybercrime and using the proceeds to create arbitrarily many replicas of itself.”

Biological Weapons Assistance. “Critical Capability Threshold: The model provides expert-level guidance which increases the accessibility or severity of bioweapon development. With the model’s help, a non-expert malicious actor can synthesize a viable pandemic agent, or an expert can synthesize a novel biological threat.”

2.1.1.2 Risk tolerance is expressed at least partly quantitatively as a combination of scenarios (qualitative) and probabilities (quantitative) for all risks (33%) 10%

Insofar as the capability thresholds are proto risk tolerances, there are no probabilities given. However, they do describe the Cyberoffense threshold as “With the model’s help, […] a talented Computer Science undergrad level malicious actor spending 3 months and $1m in compute has a substantial chance of breaking critical infrastructure.” This is measurable and “substantial chance” could be further qualified. Nonetheless, partial credit here is given.

To improve, the risk tolerance should be expressed fully quantitatively or as a combination of scenarios with probabilities.

Quotes:

Cyberoffense. “Critical Capability Threshold: The model is effective at real-world cyberoffense operations. With the model’s help, the cost of discovery of new zero-days or breaking critical infrastructure for a malicious expert actor is reduced by at least 10x; accessing classified and not previously accessible information from a major government; a talented Computer Science undergrad level malicious actor spending 3 months and $1m in compute has a substantial chance of breaking critical infrastructure.”

AI R&D. “Critical Capability Threshold: The model facilitates a dramatic acceleration in the rate of effective compute scaling or can effectively replace high-level machine learning researchers. Such capabilities may enable it to contribute to the unsafe proliferation or enhancement of other dangerous AI capabilities.”

Autonomous Replication and Adaptation. “Critical Capability Threshold: If given access to its own weights and prompted to do so, the AI model could autonomously maintain its own operation executing cybercrime and using the proceeds to create arbitrarily many replicas of itself.”

Biological Weapons Assistance. “Critical Capability Threshold: The model provides expert-level guidance which increases the accessibility or severity of bioweapon development. With the model’s help, a non-expert malicious actor can synthesize a viable pandemic agent, or an expert can synthesize a novel biological threat.”

2.1.1.3 Risk tolerance is expressed fully quantitatively as a product of severity (quantitative) and probability (quantitative) for all risks (33%) 0%

Insofar as the capability thresholds are proto risk tolerances, there are no probabilities given. To improve, there should be an explicit risk tolerance, that is expressed fully quantitatively in terms of probabilities and severities (e.g. economic damages, threats to physical life, etc.)

Quotes:

Cyberoffense. “Critical Capability Threshold: The model is effective at real-world cyberoffense operations. With the model’s help, the cost of discovery of new zero-days or breaking critical infrastructure for a malicious expert actor is reduced by at least 10x; accessing classified and not previously accessible information from a major government; a talented Computer Science undergrad level malicious actor spending 3 months and $1m in compute has a substantial chance of breaking critical infrastructure.”

AI R&D. “Critical Capability Threshold: The model facilitates a dramatic acceleration in the rate of effective compute scaling or can effectively replace high-level machine learning researchers. Such capabilities may enable it to contribute to the unsafe proliferation or enhancement of other dangerous AI capabilities.”

Autonomous Replication and Adaptation. “Critical Capability Threshodl: If given access to its own weights and prompted to do so, the AI model could autonomously maintain its own operation executing cybercrime and using the proceeds to create arbitrarily many replicas of itself.”

Biological Weapons Assistance. “Critical Capability Threshold: The model provides expert-level guidance which increases the accessibility or severity of bioweapon development. With the model’s help, a non-expert malicious actor can synthesize a viable pandemic agent, or an expert can synthesize a novel biological threat.”

2.1.2 Process to define the tolerance (20%) 0%

2.1.2.1 AI developers engage in public consultations or seek guidance from regulators where available (50%) 0%

No evidence of asking the public what risk levels they find acceptable. No evidence of seeking regulator input specifically on what constitutes acceptable risk levels.

Quotes:

No relevant quotes found.

2.1.2.2 Any significant deviations from risk tolerance norms established in other industries is justified and documented (e.g., cost-benefit analyses) (50%) 0%

No justification process: No evidence of considering whether their approach aligns with or deviates from established norms.

Quotes:

No relevant quotes found.

2.2 Operationalizing Risk Tolerance (65%) 17%

2.2.1 Key Risk Indicators (KRI) (30%) 21%

2.2.1.1 KRI thresholds are at least qualitatively defined for all risks (45%) 25%

There are risk indicators given in the form of LiveCodeBench results (>50%) and private benchmarks. To improve, the private benchmarks should be at least described, so that the thresholds they are measuring for is transparent. Further, some justification as for why LiveCodeBench is an appropriate KRI is needed, as it otherwise seems arbitrary; that is, the KRI does not appear to be grounded in risk modelling.

Quotes:

“We compare our models’ capability to publicly available closed and open-source models, to determine whether our models are sufficiently capable such that there is a real risk of setting a new state-of-the-art in dangerous AI capabilities.

A representative public benchmark we will use is LiveCodeBench, which aggregates problems from various competitive programming websites. As of publishing, the best public models currently have the following scores (Pass@1 on Code Generation, evaluation timeframe: estimated knowledge cut-off date to latest LiveCodeBench evaluation set):

Claude-3.5-Sonnet: 48.8% (04/01/2024 – 06/01/2024)
GPT-4-Turbo-2024-04-09: 43.9% (05/01/2023 – 06/01/2024)
GPT-4O-2024-05-13: 43.4% (11/01/2023 – 06/01/2024)
GPT-4-Turbo-1106: 38.8% (05/01/2023 – 06/01/2024)
DeepSeekCoder-V2: 38.1% (12/01/2023 – 06/01/2024)
Based on these scores, when, at the end of a training run, our models exceed a threshold of 50% accuracy on LiveCodeBench, we will trigger our commitment to incorporate a full system of dangerous capabilities evaluations and planned mitigations into our AGI Readiness Policy, prior to substantial further model development, or publicly deploying such models.

As an alternative threshold definition, we will also make use of a set of private benchmarks that we use internally to assess our product’s level of software engineering capability. For comparison, we will also perform these evaluations on publicly available AI systems that are generally considered to be state-of-the-art. We will have privately specified thresholds such that if we see that our model performs significantly better than publicly available models, this is considered evidence that we may be breaking new ground in terms of AI systems’ dangerous capabilities. Reaching these thresholds on our private benchmarks will also trigger our commitments to develop our full AGI Readiness Policy, with threat model evaluations and mitigations, before substantial further model development or deployment.

The expanded AGI Readiness Policy required by the above commitments will also specify more comprehensive guidelines for evaluation thresholds that apply during development and training, not just deployment, of future advanced models that cross certain eval thresholds.”

2.2.1.2 KRI thresholds are quantitatively defined for all risks (45%) 25%

There are risk indicators given in the form of LiveCodeBench results (>50%) and private benchmarks. These are quantitative and compared to publicly available models, which is commendable. To improve however, the private benchmarks should be at least described, so that the thresholds they are measuring for is transparent. Further, some justification as for why LiveCodeBench is an appropriate KRI is needed, as it otherwise seems arbitrary – it should be linked to risk modelling, for instance.

Quotes:

“We compare our models’ capability to publicly available closed and open-source models, to determine whether our models are sufficiently capable such that there is a real risk of setting a new state-of-the-art in dangerous AI capabilities.

A representative public benchmark we will use is LiveCodeBench, which aggregates problems from various competitive programming websites1. As of publishing, the best public models currently have the following scores (Pass@1 on Code Generation, evaluation timeframe: estimated knowledge cut-off date to latest LiveCodeBench evaluation set):

Claude-3.5-Sonnet: 48.8% (04/01/2024 – 06/01/2024)
GPT-4-Turbo-2024-04-09: 43.9% (05/01/2023 – 06/01/2024)
GPT-4O-2024-05-13: 43.4% (11/01/2023 – 06/01/2024)
GPT-4-Turbo-1106: 38.8% (05/01/2023 – 06/01/2024)
DeepSeekCoder-V2: 38.1% (12/01/2023 – 06/01/2024)
Based on these scores2, when, at the end of a training run, our models exceed a threshold of 50% accuracy on LiveCodeBench, we will trigger our commitment to incorporate a full system of dangerous capabilities evaluations and planned mitigations into our AGI Readiness Policy, prior to substantial further model development, or publicly deploying such models.

As an alternative threshold definition, we will also make use of a set of private benchmarks that we use internally to assess our product’s level of software engineering capability. For comparison, we will also perform these evaluations on publicly available AI systems that are generally considered to be state-of-the-art. We will have privately specified thresholds such that if we see that our model performs significantly better than publicly available models, this is considered evidence that we may be breaking new ground in terms of AI systems’ dangerous capabilities. Reaching these thresholds on our private benchmarks will also trigger our commitments to develop our full AGI Readiness Policy, with threat model evaluations and mitigations, before substantial further model development or deployment.

The expanded AGI Readiness Policy required by the above commitments will also specify more comprehensive guidelines for evaluation thresholds that apply during development and training, not just deployment, of future advanced models that cross certain eval thresholds.”

2.2.1.3 KRIs also identify and monitor changes in the level of risk in the external environment (10%) 0%

There are no KRIs which are based off of levels of risk in the external environment. Whilst their private benchmarks are in reference to other labs’ private benchmarks, satisfying this criterion requires a KRI that is contingent on risk conditions external to the model’s capabilities.

Quotes:

“As an alternative threshold definition, we will also make use of a set of private benchmarks that we use internally to assess our product’s level of software engineering capability. For comparison, we will also perform these evaluations on publicly available AI systems that are generally considered to be state-of-the-art. We will have privately specified thresholds such that if we see that our model performs significantly better than publicly available models, this is considered evidence that we may be breaking new ground in terms of AI systems’ dangerous capabilities. Reaching these thresholds on our private benchmarks will also trigger our commitments to develop our full AGI Readiness Policy, with threat model evaluations and mitigations, before substantial further model development or deployment.”

2.2.2 Key Control Indicators (KCI) (30%) 13%

2.2.2.1 Containment KCIs (35%) 25%
2.2.2.1.1 All KRI thresholds have corresponding qualitative containment KCI thresholds (50%) 50%

They give containment measures based off of RAND’s Securing Artificial Intelligence Model Weights report, but not the containment KCIs. They describe containment KCIs such as “[making] it extremely difficult for non-state actors, and eventually state-level actors, to steal our model weights” and “limit unauthorized access to LLM training environments, code, and parameters.” More detail could be added on what constitutes unauthorized access, and the KCIs could be linked more explicitly to KRI thresholds.

Quotes:

“If the engineering team sees evidence that our AI systems have exceeded the current performance thresholds on the public and private benchmarks listed above, the team is responsible for making this known immediately to the leadership team and Magic’s Board of Directors (BOD).
We will then begin executing the dangerous capability evaluations we develop for our Covered Threat Models, and they will begin serving as triggers for more stringent information security measures and deployment mitigations.”

“We will implement the following information security measures, based on recommendations in RAND’s Securing Artificial Intelligence Model Weights report, if and when we observe evidence that our models are proficient at our Covered Threat Models.

Hardening model weight and code security: implementing robust security controls to prevent unauthorized access to our model weights. These controls will make it extremely difficult for non-state actors, and eventually state-level actors, to steal our model weights.
Internal compartmentalization: implementing strong access controls and strong authentication mechanisms to limit unauthorized access to LLM training environments, code, and parameters.”

2.2.2.1.2 All KRI thresholds have corresponding quantitative containment KCI thresholds (50%) 0%

The containment KCI thresholds given are not quantitative.

Quotes:

“We will implement the following information security measures, based on recommendations in RAND’s Securing Artificial Intelligence Model Weights report3, if and when we observe evidence that our models are proficient at our Covered Threat Models.

Hardening model weight and code security: implementing robust security controls to prevent unauthorized access to our model weights. These controls will make it extremely difficult for non-state actors, and eventually state-level actors, to steal our model weights.
Internal compartmentalization: implementing strong access controls and strong authentication mechanisms to limit unauthorized access to LLM training environments, code, and parameters.”

2.2.2.2 Deployment KCIs (35%) 13%
2.2.2.2.1 All KRI thresholds have corresponding qualitative deployment KCI thresholds (50%) 25%

The mitigations they describe are proto deployment KCI thresholds, for instance models “robustly refuse requests for aid in causing harm” and “output safety classifiers [prevent] serious misuse of models.” However, these are only mitigations they “might” employ; a more structured process where clear, measurable deployment KCIs are linked to KRIs is needed.

Quotes:

“Deployment mitigations aim to disable dangerous capabilities of our models once detected. These mitigations will be required in order to make our models available for wide use, if the evaluations for our Covered Threat Models trigger.

The following are two examples of deployment mitigations we might employ:

Harm refusal: we will train our models to robustly refuse requests for aid in causing harm – for example, requests to generate cybersecurity exploits.
Output monitoring: we may implement techniques such as output safety classifiers to prevent serious misuse of models. Automated detection may also apply for internal usage within Magic.
A full set of mitigations will be detailed publicly by the time we complete our policy implementation, as described in this document’s introduction. Other categories of mitigations beyond the two illustrative examples listed above likely will be required.”

2.2.2.2.2 All KRI thresholds have corresponding quantitative deployment KCI thresholds (50%) 0%

The deployment KCI thresholds given are not quantitative, though could likely easily be made so, e.g. refusal rate on a dataset.

Quotes:

“Deployment mitigations aim to disable dangerous capabilities of our models once detected. These mitigations will be required in order to make our models available for wide use, if the evaluations for our Covered Threat Models trigger.

The following are two examples of deployment mitigations we might employ:

Harm refusal: we will train our models to robustly refuse requests for aid in causing harm – for example, requests to generate cybersecurity exploits.
Output monitoring: we may implement techniques such as output safety classifiers to prevent serious misuse of models. Automated detection may also apply for internal usage within Magic.
A full set of mitigations will be detailed publicly by the time we complete our policy implementation, as described in this document’s introduction. Other categories of mitigations beyond the two illustrative examples listed above likely will be required.”

2.2.2.3 For advanced KRIs, assurance process KCIs are defined (30%) 0%

There are no assurance processes KCIs defined. The framework does not provide recognition of there being KCIs outside of containment and deployment measures.

Quotes:

No relevant quotes found.

2.2.3 Pairs of thresholds are grounded in risk modeling to show that risks remain below the tolerance (20%) 10%

They mention that the residual risk should be such that they can “continue development and deployment in a safe manner”. They also note that they may change their KRIs if other companies have higher KRI thresholds crossed but the residual risk remains acceptable. Together, these show awareness of pairing KRI and KCI thresholds to show that the residual risk remains below the risk tolerance. However, this link could be more explicit, plus linked to risk modelling. “A safe manner” should be more precisely defined.

Most importantly, there should be justification for why, if the KRI threshold is crossed but the KCI threshold is met, the residual risk remains below the risk tolerance.

Quotes:

“Prior to publicly deploying models that exceed the current frontier of coding performance, we will evaluate them for dangerous capabilities and ensure that we have sufficient protective measures in place to continue development and deployment in a safe manner.”

“Over time, public evidence may emerge that it is safe for models that have demonstrated proficiency beyond the above thresholds to freely proliferate without posing any significant catastrophic risk to public safety. For this reason, we may update this threshold upward over time. We may also modify the public and private benchmarks used. Such a change will require approval by our Board of Directors, with input from external security and AI safety advisers.”

2.2.4 Policy to put development on hold if the required KCI threshold cannot be achieved, until sufficient controls are implemented to meet the threshold (20%) 25%

There is a clear policy to put development on hold if KRIs are not developed. As for KCIs however, they commit to “delaying or pausing development in the worst case until the dangerous capability detected has been mitigated or contained.” However, more clarity for this decision should be given, such as what consitutes sufficient mitigation/containment, and an explicit threshold that would determine pausing development. Conditions and the process for dedeployment should also be detailed.

Quotes:

“If the engineering team sees evidence that our AI systems have exceeded the current performance thresholds on the public and private benchmarks listed above, the team is responsible for making this known immediately to the leadership team and Magic’s Board of Directors (BOD).

We will then begin executing the dangerous capability evaluations we develop for our Covered Threat Models, and they will begin serving as triggers for more stringent information security measures and deployment mitigations. If we have not developed adequate dangerous capability evaluations by the time these benchmark thresholds are exceeded, we will halt further model development until our dangerous capability evaluations are ready.”

“In cases where said risk for any threat model passes a ‘red-line’, we will adopt safety measures outlined in the Threat Mitigations section, which include delaying or pausing development in the worst case until the dangerous capability detected has been mitigated or contained.”

Back to top

3.1 Implementing Mitigation Measures (50%) 9%

3.1.1 Containment measures (35%) 6%

3.1.1.1 Containment measures are precisely defined for all KCI thresholds (60%) 10%

The containment measures described remain high level, such as “implementing robust security controls” or “strong access controls and strong authentication mechanisms”. The actual ‘controls’ and ‘mechanisms’ implemented should be described in more detail.

They do mention that mitigations will be described in more detail prior to deploying models. However, this planning should occur pre-development as much as possible, in case risks are higher than expected after the model is developed.

Quotes:

“To prepare for these risks, we are introducing an initial version of our AGI Readiness Policy, describing dangerous AI capabilities we plan to monitor, as well as high-level safety and security practices we will adopt to reduce risk. Prior to deploying models with frontier coding capabilities, we will describe these mitigations in more detail. We will also define specific plans for what level of mitigations are necessary in response to a range of dangerous capability thresholds.”

“We will implement the following information security measures, based on recommendations in RAND’s Securing Artificial Intelligence Model Weights report, if and when we observe evidence that our models are proficient at our Covered Threat Models.

Hardening model weight and code security: implementing robust security controls to prevent unauthorized access to our model weights. These controls will make it extremely difficult for non-state actors, and eventually state-level actors, to steal our model weights.
Internal compartmentalization: implementing strong access controls and strong authentication mechanisms to limit unauthorized access to LLM training environments, code, and parameters.”

3.1.1.2 Proof that containment measures are sufficient to meet the thresholds (40%) 0%

No proof is provided that the containment measures are sufficient to meet the containment KCI thresholds, nor process for soliciting such proof.

Quotes:

No relevant quotes found.

3.1.1.3 Strong third party verification process to verify that the containment measures meet the threshold (100% if 3.1.1.3 > [60% x 3.1.1.1 + 40% x 3.1.1.2]) 0%

There is no mention of third-party verification that containment measures meet the threshold.

Quotes:

No relevant quotes found.

3.1.2 Deployment measures (35%) 15%

3.1.2.1 Deployment measures are precisely defined for all KCI thresholds (60%) 25%

The containment measures described remain high level, such as “train our models to robustly refuse requests”, “output safety classifiers” and “automated detection may also apply”. The actual controls and mechanisms that will be implemented to satisfy the deployment KCI threshold should be described in more detail.

They do mention that mitigations will be described in more detail “by the time that we deploy models that exceed the current frontier of coding capabilities.” However, this planning should occur pre-development, in case risks are higher than expected after the model is developed.

Quotes:

“Prior to deploying models with frontier coding capabilities, we will describe these mitigations in more detail. We will also define specific plans for what level of mitigations are necessary in response to a range of dangerous capability thresholds.”

“Deployment mitigations aim to disable dangerous capabilities of our models once detected. These mitigations will be required in order to make our models available for wide use, if the evaluations for our Covered Threat Models trigger.

The following are two examples of deployment mitigations we might employ:

Harm refusal: we will train our models to robustly refuse requests for aid in causing harm – for example, requests to generate cybersecurity exploits.
Output monitoring: we may implement techniques such as output safety classifiers to prevent serious misuse of models. Automated detection may also apply for internal usage within Magic.
A full set of mitigations will be detailed publicly by the time we complete our policy implementation, as described in this document’s introduction. Other categories of mitigations beyond the two illustrative examples listed above likely will be required.”

3.1.2.2 Proof that deployment measures are sufficient to meet the thresholds (40%) 0%

No proof is provided that the deployment measures are sufficient to meet the deployment KCI thresholds, nor is there a process to solicit such proof.

Quotes:

No relevant quotes found.

3.1.2.3 Strong third party verification process to verify that the deployment measures meet the threshold (100% if 3.1.2.3 > [60% x 3.1.2.1 + 40% x 3.1.2.2]) 0%

There is no mention of third-party verification of deployment measures meeting the threshold.

Quotes:

No relevant quotes found.

3.1.3 Assurance processes (30%) 5%

3.1.3.1 Credible plans towards the development of assurance properties (40%) 10%

Whilst they mention that “By the time that we deploy models that exceed the current frontier of coding capabilities, we commit to having implemented a full set of dangerous capability evaluations and planned mitigations for our Covered Threat Models (described below), as well as having executed our initial dangerous capability evaluations”, this does not explicitly mention assurance processes. Further, assurance processes require further research – there is no commitment given to contributing to this research effort.

Quotes:

“By the time that we deploy models that exceed the current frontier of coding capabilities, we commit to having implemented a full set of dangerous capability evaluations and planned mitigations for our Covered Threat Models (described below), as well as having executed our initial dangerous capability evaluations.”

3.1.3.2 Evidence that the assurance properties are enough to achieve their corresponding KCI thresholds (40%) 0%

There is no mention of providing evidence that the assurance processes are sufficient.

Quotes:

No relevant quotes found.

3.1.3.3 The underlying assumptions that are essential for their effective implementation and success are clearly outlined (20%) 10%

There is an awareness that assumptions are necessary to make certain risk assessment claims, such as for requiring adequate security measures.
However, these are not applied to assurance processes specifically. To improve, assumptions should be stated concerning e.g. the assumed alignment of the model/deception capabilities of the model, such as prevalence of sandbagging or faithfulness of chain of thought, in order for the risk level to remain below the risk tolerance.

Quotes:

“The effectiveness of our deployment mitigations – like training models to refuse harmful requests, continuously monitoring a model’s outputs for misuse, and other proprietary interventions – is generally contingent on the models being securely in our possession. Accordingly, we will place particular emphasis on implementing information security measures.”

3.2 Continuous Monitoring and Comparing Results with Pre-determined Thresholds (50%) 7%

3.2.1 Monitoring of KRIs (40%) 16%

3.2.1.1 Justification that elicitation methods used during the evaluations are comprehensive enough to match the elicitation efforts of potential threat actors (30%) 0%

There is no description of elicitation methods, nor justification that these are comprehensive enough to match the elicitation efforts of potential threat actors.

Quotes:

No relevant quotes found.

3.2.1.2 Evaluation frequency (25%) 0%

Evaluations are conducted at least once a quarter. However, frequency should also include relative variation of effective computing power used in training, to ensure KRI thresholds are not crossed unnoticed. It would be an improvement to note that this quarter time period is due to accounting for post-training enhancements.

Quotes:

“Our process for determining whether our models have reached this frontier involves continuously monitoring our AI systems using public and private benchmarks. In this section, we focus on evaluations using coding benchmarks, as Magic’s models are optimized for code generation.”

“A member of staff will be appointed who is responsible for sharing the following with our Board of Directors on a quarterly basis:

A report on the status of the AGI Readiness Policy implementation
Our AI systems’ current proficiency at the public and private benchmarks laid out above”

3.2.1.3 Description of how post-training enhancements are factored into capability assessments (15%) 0%

There is no description of how post-training enhancements are factored into capability assessments, nor safety margins given.

Quotes:

No relevant quotes found.

3.2.1.4 Vetting of protocols by third parties (15%) 25%

There is a description of gaining input from relevant experts on the development of “detailed dangerous capability evaluations.” Further, approval from the Board of Directors is needed to change which benchmarks are used as KRIs, and this decision is made “with input from external security and AI safety advisers”. This is a good start for satisfying this criterion; however, more detail and structured process is required, e.g. detail on which third parties will be inputting into protocols; whether they simply assist with protocol development or actually review the protocols (favouring the latter); and a guarantee of sufficient expertise and independence.

Quotes:

“We describe these threat models along with high-level, illustrative capability levels that would require strong mitigations. We commit to developing detailed dangerous capability evaluations for these threat models based on input from relevant experts, prior to deploying frontier coding models.”

“Over time, public evidence may emerge that it is safe for models that have demonstrated proficiency beyond the above thresholds to freely proliferate without posing any significant catastrophic risk to public safety. For this reason, we may update this threshold upward over time. We may also modify the public and private benchmarks used. Such a change will require approval by our Board of Directors, with input from external security and AI safety advisers.”

3.2.1.5 Replication of evaluations by third parties (15%) 0%

There is no mention of evaluations being replicated or conducted by third parties.

Quotes:

No relevant quotes found.

3.2.2 Monitoring of KCIs (40%) 0%

3.2.2.1 Detailed description of evaluation methodology and justification that KCI thresholds will not be crossed unnoticed (40%) 0%

No process or justification is given for ensuring that mitigation effectiveness is monitored such that measures always meet the KCI threshold.

Quotes:

No relevant quotes found.

3.2.2.2 Vetting of protocols by third parties (30%) 0%

There is no mention of KCIs protocols being vetted by third parties.

Quotes:

No relevant quotes found.

3.2.2.3 Replication of evaluations by third parties (30%) 0%

There is no mention of control evaluations/mitigation testing being replicated or conducted by third-parties.

Quotes:

No relevant quotes found.

3.2.3 Transparency of evaluation results (10%) 0%

3.2.3.1 Sharing of evaluation results with relevant stakeholders as appropriate (85%) 0%

There is no commitment to publicly share evaluation results, nor to notify relevant government authorities if KRI thresholds are crossed.

Quotes:

No relevant quotes found.

3.2.3.2 Commitment to non-interference with findings (15%) 0%

No commitment to permitting the reports, which detail the results of external evaluations (i.e. any KRI or KCI assessments conducted by third parties), to be written independently and without interference or suppression.

Quotes:

No relevant quotes found.

3.2.4 Monitoring for novel risks (10%) 0%

3.2.4.1 Identifying novel risks post-deployment: engages in some process (post deployment) explicitly for identifying novel risk domains or novel risk models within known risk domains (50%) 0%

There is no mention of a process for identifying novel risks post-deployment.

Quotes:

No relevant quotes found.

3.2.4.2 Mechanism to incorporate novel risks identified post-deployment (50%) 0%

There is no mechanism to incorporate risks identified during post-deployment that is detailed.

Quotes:

No relevant quotes found.

Back to top

4.1 Decision-making (25%) 19%

4.1.1 The company has clearly defined risk owners for every key risk identified and tracked (25%) 0%

No mention of risk owners.

Quotes:

No relevant quotes found.

4.1.2 The company has a dedicated risk committee at the management level that meets regularly (25%) 0%

No mention of a management risk committee.

Quotes:

No relevant quotes found.

4.1.3 The company has defined protocols for how to make go/no-go decisions (25%) 50%

The policy details fairly detailed protocols for go/no-go decision making.

Quotes:

“Magic’s engineering team… is responsible for conducting evaluations on the public and private coding benchmarks… If the engineering team sees evidence that our AI systems have exceeded the current performance thresholds… the team is responsible for making this known immediately to the leadership team and Magic’s Board of Directors (BOD).” (p. 3)
“If we have not developed adequate dangerous capability evaluations by the time these benchmark thresholds are exceeded, we will halt further model development until our dangerous capability evaluations are ready.” (p. 3)
“In cases where said risk for any threat model passes a ‘red-line’, we will adopt safety measures outlined in the Threat Mitigations section, which include delaying or pausing development in the worst case until the dangerous capability detected has been mitigated or contained.” (p. 4)

4.1.4 The company has defined escalation procedures in case of incidents (25%) 25%

The policy lists one element of escalation procedures – informing management and the Board.

Quotes:

“Magic’s engineering team… is responsible for making this known immediately to the leadership team and Magic’s Board of Directors (BOD).” (p. 3)

4.2. Advisory and Challenge (20%) 5%

4.2.1 The company has an executive risk officer with sufficient resources (16.7%) 0%

No mention of an executive risk officer.

Quotes:

No relevant quotes found.

4.2.2 The company has a committee advising management on decisions involving risk (16.7%) 0%

No mention of an advisory committee.

Quotes:

No relevant quotes found.

4.2.3 The company has an established system for tracking and monitoring risks (16.7%) 10%

The policy references a few benchmarks that will be used to track risks.

Quotes:

“When, at the end of a training run, our models exceed a threshold of 50% accuracy on LiveCodeBench, we will trigger our commitment”. (p. 2)
“We will also make use of a set of private benchmarks that we use internally to assess our product’s level of software engineering capability.” (p. 2)

4.2.4 The company has designated people that can advise and challenge management on decisions involving risk (16.7%) 0%

No mention of people that challenge decisions.

Quotes:

No relevant quotes found.

4.2.5 The company has an established system for aggregating risk data and reporting on risk to senior management and the Board (16.7%) 10%

The policy lists some rudimentary elements of reporting to the Board.

Quotes:

“A member of staff will be appointed who is responsible for sharing the following with our Board of Directors on a quarterly basis: A report on the status of the AGI Readiness Policy implementation, our AI systems’ current proficiency at the public and private benchmarks laid out above”. (p. 3)

4.2.6 The company has an established central risk function (16.7%) 10%

While there does not seem to be a central risk team, the policy mentions a team that will create early warning evaluations.

Quotes:

“An internal team will develop and execute evaluations that can provide early warnings of whether the AI systems we’ve built increase the risk from our Covered Threat Models.” (p. 3)

4.3 Audit (20%) 5%

4.3.1 The company has an internal audit function involved in AI governance (50%) 0%

No mention of an internal audit function.

Quotes:

No relevant quotes found.

4.3.2 The company involves external auditors (50%) 10%

The policy lists input from external experts, but only as potential and not as an independent review.

Quotes:

“Magic’s engineering team, potentially in collaboration with external advisers, is responsible for conducting evaluations on the public and private coding benchmarks described above.” (p. 3)
“Such a change will require approval by our Board of Directors, with input from external security and AI safety advisers.” (p. 3)

4.4 Oversight (20%) 5%

4.4.1 The Board of Directors of the company has a committee that provides oversight over all decisions involving risk (50%) 10%

While it is unclear if there is a designated Board risk committee, it is clear from the policy that the Board has a few designated governance roles.

Quotes:

“For this reason, we may update this threshold upward over time. We may also modify the public and private benchmarks used. Such a change will require approval by our Board of Directors, with input from external security and AI safety advisers.” (p. 3)
“Magic’s engineering team… is responsible for making this known immediately to the leadership team and Magic’s Board of Directors (BOD).” (p. 3)

4.4.2 The company has other governing bodies outside of the Board of Directors that provide oversight over decisions (50%) 0%

No mention of any additional governance bodies.

Quotes:

No relevant quotes found.

4.5 Culture (10%) 12%

4.5.1 The company has a strong tone from the top (33.3%) 25%

The policy includes a few statements that establish a fairly strong tone from the top.

Quotes:

“Building such systems, we believe, will bring enormous societal value. However, we also believe AI development poses the possibility of serious negative externalities on society, including catastrophic risks to public security and wellbeing.” (p. 1)

4.5.2 The company has a strong risk culture (33.3%) 10%

The only element of risk culture that appears in the policy is a mention of a plan to update measures and commitments over time.

Quotes:

“We plan to adapt our safety measures and commitments over time in line with empirical observation of risks posed by the systems that we are developing.” (p. 1)

4.5.3 The company has a strong speak-up culture (33.3%) 0%

No mention of elements of speak-up culture.

Quotes:

No relevant quotes found.

4.6 Transparency (5%) 20%

4.6.1 The company reports externally on what their risks are (33.3%) 50%

The policy lists the risks that are in scope for the policy, although with some caveats.

Quotes:

“Our current understanding suggests at least four threat models of concern as our AI systems become more capable: Cyberoffense, AI R&D, Autonomous Replication and Adaptation (ARA), and potentially Biological Weapons Assistance.” (p. 4)

4.6.2 The company reports externally on what their governance structure looks like (33.3%) 10%

The policy includes a mention of the Board’s role in the governance structure.

Quotes:

“2. Reports to Governing Bodies
A member of staff will be appointed who is responsible for sharing the following with our Board of Directors on a quarterly basis: A report on the status of the AGI Readiness Policy implementation…Our AI systems’ current proficiency at the public and private benchmarks laid out above”. (p. 3)

4.6.3 The company shares information with industry peers and government bodies (33.3%) 0%

No mention of information sharing.

Quotes:

No relevant quotes found.

Back to top