Amazon

Very Weak 0.9/5

Click categories for more information

very weak

weak

moderate

substantial

strong

Risk Identification

Learn more

Risk Identification

11%

Risk Analysis and Evaluation

Learn more

Risk Analysis and Evaluation

16%

Risk Treatment

Learn more

Risk Treatment

23%

Risk Governance

Learn more

Risk Governance

22%

Best in class

SEE FRAMEWORK

Amazon stands out for having a rigorously defined suite of containment measures.
They also uniquely set out that they use “formal methods to ensure correctness of security-critical components and subsystems.” This method could likely be implemented to provide ex ante proof that containment measures are sufficient to meet thresholds.

Overview

Highlights relative to others

Stronger transparency, such as sharing information with industry peers and government.

Defined protocols for making go/no-go decisions.

Clear process for how risk is reported to senior management.

Some awareness of open-ended red teaming.

Weaknesses relative to others

No justification for excluding some risks, such as persuasion or loss of control risks, and including others.

No risk tolerance given beyond preventing "severe public safety risks".

Each risk domain has only one risk threshold, without justification for why one sufficient.

Lack of risk modelling.

Amazon

1. Risk Identification

Very Weak 11%

1.1 Classification of Applicable Known Risks (40%) 13%

1.1.1 Risks from literature and taxonomies are well covered (50%) 25%

The criterion is partially addressed, covering the risk areas of CBRN weapons proliferation, offensive cyber operations and automated AI R&D. They do not include other risks often cited in the literature, such as nuclear, radiological, persuasion, and loss of control risks, and 1.1.2 is less than 50%.

Quotes:

“Critical Capability Thresholds describe model capabilities within specified risk domains that could cause severe public safety risks. When evaluations demonstrate that an Amazon frontier model has crossed these Critical Capability Thresholds, the development team will apply appropriate safeguards.” (p. 2) The thresholds are the following: Chemical, Biological, Radiological, and Nuclear (CBRN) Weapons Proliferation, Offensive Cyber Capabilities, and Automated AI R&D.

1.1.2 Exclusions are clearly justified and documented (50%) 0%

No justification for exclusion of risks such as manipulation or loss of control risks.

Quotes:

No relevant quotes found.

1.2 Identification of Unknown Risks (Open-ended red teaming) (20%) 10%

1.2.1 Internal open-ended red teaming (70%) 10%

There is some indication of engaging in open-ended red teaming internally, with a “strong network” of internal red teamers “with deep subject matter expertise” that are “critical in surfacing early insights into emerging critical capabilities.” This doesn’t necessarily commit to a process explicitly for identifying novel risk domains or risk models with the frontier model; however, it does seem to show awareness that a red-team’s engagement with the model surfaces new insights about capabilities, especially emergent capabilities.

To improve, they should explicitly commit to a process to identify either novel risk domains, or novel risk models/changed risk profiles within pre-specified risk domains (e.g. emergence of an extended context length allowing improved zero shot learning changes the risk profile), and detail the methodology and expertise of the internal team.

Quotes:

“Learning from our red teaming network: We continue to build our strong network of internal and external red teamers including red teamers with deep subject matter expertise in risks related to critical capabilities. These experts are critical in surfacing early insights into emerging critical capabilities and help us identify and implement appropriate mitigations.” (p. 4)

1.2.2 Third party open-ended red teaming (30%) 10%

There is some indication of engaging in open-ended red teaming externally, with a “strong network” of external red teamers “with deep subject matter expertise” that are “critical in surfacing early insights into emerging critical capabilities.” This doesn’t necessarily commit to a process explicitly for identifying novel risk domains or risk models with the frontier model; however, it does seem to show awareness that a red-team’s engagement with the model surfaces new insights about capabilities, especially emergent capabilities, and that there is benefit in soliciting third parties for this activity.

Quotes:

1.3 Risk modeling (40%) 11%

1.3.1 The company uses risk models for all the risk domains identified and the risk models are published (with potentially dangerous information redacted) (40%) 10%

There is no description of risk modelling or engaging in risk models. However, these could be easily implemented. For instance, when they mention that “The CBRN Capability Threshold focuses on the potential that a frontier model may provide actors material “uplift” in excess of other publicly available research or existing tools, such as internet search”, risk modeling should be provided for how upllift to these actors may be provided using a step by step causal pathway, and what the precise threat scenarios deriving from this causal pathway is. These should then be published.

They do mention that they engage in “collaboration on threat modeling and updated Critical Capability Thresholds” to “account for evolving (and potentially new) threats.” However, this seems to refer more to what threat scenarios to consider, than step by step causal pathways to map out.

Quotes:

“CBRN Weapons Proliferation focuses on the risk that a model may be able to guide malicious actors in developing and deploying CBRN weapons. The CBRN Capability Threshold focuses on the potential that a frontier model may provide actors material “uplift” in excess of other publicly available research or existing tools, such as internet search.” (p. 2)

“Offensive Cyber Operations focuses on risks that would arise from the use of a model by malicious actors to compromise digital systems with the intent to cause harm. The Offensive Cyber Operations Threshold focuses on the potential that a frontier model may provide material uplift in excess of other publicly available research or existing tools, such as internet search.” (p. 2)

“Automating AI R&D processes could accelerate discovery and development of AI capabilities that will be critical for solving global challenges. However, Automated AI R&D could also accelerate the development of models that pose enhanced CBRN, Offensive Cybersecurity, or other severe risks.” (p. 2)

“Collaboration on threat modeling and updated Critical Capability Thresholds: Amazon is committed to partnering with governments, domain experts, and industry peers to continuously improve Amazon’s awareness of the threat environment and ensure that our Critical Capability Thresholds and evaluation processes account for evolving (and potentially new) threats.” (p. 4)

1.3.2 Risk modeling methodology (40%) 4%

1.3.2.1 Methodology precisely defined (70%) 0%

There is no methodology for risk modeling defined.

Quotes:

No relevant quotes found.

1.3.2.2 Mechanism to incorporate red teaming findings (15%) 0%

There is some reference to identifying mitigations through open-ended red teaming which “[surface] early insights”, however there is no reference to then incorporate these early insights of risk into risk modelling.

Quotes:

1.3.2.3 Prioritization of severe and probable risks (15%) 25%

There is an implicit prioritization of severe harms, but not the most probable harms. There is no indication that risk models are given severity/probability scores (qualitative or quantitative).

Quotes:

“This Framework outlines the protocols we will follow to ensure that frontier models developed by Amazon do not expose critical capabilities that have the potential to create severe risks.” (p. 1)

“This Framework focuses on severe risks that are unique to frontier AI models as they scale in size and capability and which require specialized evaluation methods and safeguards.” (p. 1)

1.3.3 Third party validation of risk models (20%) 25%

Amazon indicates a commitment to “partnering” with thrid parties to give input into “threat modelling”, in order to “improve Amazon’s awareness of the threat environment.” To improve, more detail is required on how third parties not only give input but validate risk models, and ideally name experts involved.

Quotes:

Amazon

2. Risk Analysis and Evaluation

Very Weak 16%

2.1 Setting a Risk Tolerance (35%) 7%

2.1.1 Risk tolerance is defined (80%) 8%

2.1.1.1 Risk tolerance is at least qualitatively defined for all risks (33%) 25%

There is no explicit reference to a risk tolerance, though implicitly it is some level of risk that “could cause severe public safety risks” (p. 2). The risk tolerance for each risk domain is implicitly defined by critical capability thresholds. For instance, CBRN Weapons Proliferation: “AI at this level will be capable of providing expert-level, interactive instruction that provides material uplift (beyond other publicly available research or tools) that would enable a non-subject matter expert to reliably produce and deploy a CBRN weapon.”

To improve, they should set out the maximum amount of risk the company is willing to accept for each risk domain (though these need not differ between risk domains), ideally expressed in terms of probabilities and severity (economic damages, physical lives, etc), and separate from KRIs.

Quotes:

“Critical Capability Thresholds describe model capabilities within specified risk domains that could cause severe public safety risks.” (p. 2)

CBRN Weapons Proliferation: “AI at this level will be capable of providing expert-level, interactive instruction that provides material uplift (beyond other publicly available research or tools) that would enable a non-subject matter expert to reliably produce and deploy a CBRN weapon.” (p. 2)

Offensive Cyber Operations: “AI at this level will be capable of providing material uplift (beyond other publicly available research or tools) that would enable a moderately skilled actor (e.g., an individual with undergraduate level understanding of offensive cyber activities or operations) to discover new, high-value vulnerabilities and automate the development
and exploitation of such vulnerabilities.” (p. 2)

Automated AI R&D: “AI at this level will be capable of replacing human researchers and fully automating the research, development, and deployment of frontier models that will pose severe risk such as accelerating the development of enhanced CBRN weapons and offensive cybersecurity methods.” (p. 2)

2.1.1.2 Risk tolerance is expressed at least partly quantitatively as a combination of scenarios (qualitative) and probabilities (quantitative) for all risks (33%) 0%

The implicit risk tolerance of potentially causing “severe public safety risks” is not a quantitative nor partly quantitative definition. Further, the implicit risk tolerances offered by the critical capability thresholds are not quantitative nor partly quantitative. To improve, the risk tolerance should be expressed fully quantitatively or as a combination of scenarios with probabilities.

Quotes:

“Critical Capability Thresholds describe model capabilities within specified risk domains that could cause severe public safety risks.” (p. 2)

2.1.1.3 Risk tolerance is expressed fully quantitatively as a product of severity (quantitative) and probability (quantitative) for all risks (33%) 0%

The implicit risk tolerance of potentially causing “severe public safety risks” is not a quantitative nor partly quantitative definition. The implicit risk tolerances given by the critical capability thresholds are not fully quantitative, either.

Quotes:

“Critical Capability Thresholds describe model capabilities within specified risk domains that could cause severe public safety risks.” (p. 2)

2.1.2 Process to define the tolerance (20%) 0%

2.1.2.1 AI developers engage in public consultations or seek guidance from regulators where available (50%) 0%

No evidence of engaging in public consultations or seeking guidance from regulators for risk tolerance.

Quotes:

No relevant quotes found.

2.1.2.2 Any significant deviations from risk tolerance norms established in other industries is justified and documented (e.g., cost-benefit analyses) (50%) 0%

No justification process: No evidence of considering whether their approach aligns with or deviates from established norms.

Quotes:

No relevant quotes found.

2.2 Operationalizing Risk Tolerance (65%) 22%

2.2.1 Key Risk Indicators (KRI) (30%) 21%

2.2.1.1 KRI thresholds are at least qualitatively defined for all risks (45%) 25%

Each risk domain has one KRI, which is qualitatively defined and grounded in risk modelling. To improve, they could have KRIs of more granular severity (i.e. ‘Level 1’ and ‘Level 2’), as well as multiple KRIs for the risk domains to highlight different attack pathways. For instance, “enabl[ing] a subject matter expert to reliably produce and deploy a CBRN weapon” is quite broad, as CBRN covers four different weapon types. Further, KRIs should map to the actual evaluations performed.

Quotes:

“Critical Capability Thresholds describe model capabilities within specified risk domains that could cause severe public safety risks.” (p. 2)

e.g. CBRN: “AI at this level will be capable of providing expert-level, interactive instruction that provides material uplift (beyond other publicly available research or tools) that would enable a non-subject matter expert to reliably produce and deploy a CBRN weapon.” (p. 2)

Offensive Cyber Operations: “AI at this level will be capable of providing material uplift (beyond other publicly available research or tools) that would enable a moderately skilled actor (e.g., an individual with undergraduate level understanding of offensive cyber activities or operations) to discover new, high-value vulnerabilities and automate the development and exploitation of such vulnerabilities.” (p. 2)

Automated AI R&D: “AI at this level will be capable of replacing human researchers and fully automating the research, development, and deployment of frontier models that will pose severe risk such as accelerating the development of enhanced CBRN weapons and offensive cybersecurity methods.” (p. 2)

2.2.1.2 KRI thresholds are quantitatively defined for all risks (45%) 25%

Two of the KRIs reference the threshold as where AIs provide “material uplift”, determined through comparison in uplift studies. This allows uplift to be “quantitatively assessed”. However, the specification of what counts as material uplift is not defined. To improve, quantitative thresholds should be given.

Quotes:

CBRN: “Critical Capability Threshold AI at this level will be capable of providing expert-level, interactive instruction that provides material uplift (beyond other publicly available research or tools) that would enable a non-subject matter expert to reliably produce and deploy a CBRN weapon.” (p. 2)

Offensive Cyber Operations: “AI at this level will be capable of providing material uplift (beyond other publicly available research or tools) that would enable a moderately skilled actor (e.g., an individual with undergraduate level understanding of offensive cyber activities or operations) to discover new, high-value vulnerabilities and automate the development and exploitation of such vulnerabilities.” (p. 2)

“Uplift studies evaluate whether a frontier model enhances the ability for a human to execute a specific type of attack when given access to a frontier model versus without access. “Uplift” can be quantitatively assessed through uplift studies, which use controlled trials to compare the abilities of a group with access to the frontier model to the abilities of a group without access to the frontier model. https://www.frontiermodelforum.org/updates/issue-brief-preliminary-taxonomy-of-predeployment-
frontier-ai-safety-evaluations/” (p. 2)

2.2.1.3 KRIs also identify and monitor changes in the level of risk in the external environment (10%) 0%

The KRIs only reference model capabilities.

Quotes:

No relevant quotes found.

2.2.2 Key Control Indicators (KCI) (30%) 18%

2.2.2.1 Containment KCIs (35%) 25%

2.2.2.1.1 All KRI thresholds have corresponding qualitative containment KCI thresholds (50%) 50%

There is a containment KCI threshold of “prevent[ing] unauthorized access to model weights or guardrails implemented as part of the [deployment measures], which could enable a malicious actor to remove or bypass existing guardrails to exceed Critical Capability Thresholds.” (p. 3) However, more detail could be added on what constitutes a “malicious actor”, and what level of assurance is required.

The KCI clearly links to each Critical Capability Threshold.

Quotes:

“Upon determining that an Amazon model has reached a Critical Capability Threshold, we will implement a set of Safety Measures and Security Measures to prevent elicitation of the critical capability identified and to protect against inappropriate access risks. Safety Measures are designed to prevent the elicitation of the observed Critical Capabilities following deployment of the model. Security Measures are designed to prevent unauthorized access to model weights or guardrails implemented as part of the Safety Measures, which could enable a malicious actor to remove or bypass existing guardrails to exceed Critical Capability Thresholds.” (p. 3)

2.2.2.1.2 All KRI thresholds have corresponding quantitative containment KCI thresholds (50%) 0%

The containment KCI is only qualitative. To improve, the containment KCI should be described as a measurable target, that has precise quantitative indications for when it is reached.

Quotes:

“Upon determining that an Amazon model has reached a Critical Capability Threshold, we will implement a set of Safety Measures and Security Measures to prevent elicitation of the critical capability identified and to protect against inappropriate access risks. Safety Measures are designed to prevent the elicitation of the observed Critical Capabilities following deployment of the model. Security Measures are designed to prevent unauthorized access to model weights or guardrails
implemented as part of the Safety Measures, which could enable a malicious actor to remove or bypass existing guardrails to exceed Critical Capability Thresholds.” (p. 3)

2.2.2.2 Deployment KCIs (35%) 25%

2.2.2.2.1 All KRI thresholds have corresponding qualitative deployment KCI thresholds (50%) 50%

The criterion is partially addressed – there is an indication that deployment KCI measures must sufficiently “[prevent] reliable elicitation of the capability by malicious actors”. However, “reliable elicitation” and “malicious” should be more precisely defined, and should reference relevant threat actors/their resources for elicitation.

The KCI should also be tied to specific KRIs – for instance, the deployment KCI likely differs for a model that crosses the Critical Capability Threshold for Offensive Cyber Operations versus for Automated AI R&D.

Quotes:

“We will evaluate models following the application of these safeguards to ensure that they adequately mitigate the risks associated with the Critical Capability Threshold. In the event these evaluations reveal that an Amazon frontier model meets or exceeds a Critical Capability Threshold and our Safety and Security Measures are unable to appropriately mitigate the risks (e.g., by preventing reliable elicitation of the capability by malicious actors), we will not deploy the model until we have
identified and implemented appropriate additional safeguards.” (p. 3)

2.2.2.2.2 All KRI thresholds have corresponding quantitative deployment KCI thresholds (50%) 0%

There are no quantitative deployment KCI thresholds given.

Quotes:

No relevant quotes found.

2.2.2.3 For advanced KRIs, assurance process KCIs are defined (30%) 0%

There are no assurance processes KCIs defined. The framework does not provide recognition of there being KCIs outside of containment and deployment measures.

Quotes:

No relevant quotes found.

2.2.3 Pairs of thresholds are grounded in risk modeling to show that risks remain below the tolerance (20%) 25%

There is an awareness that KRI and KCIs must pair together to remain below risk tolerance and be publicly deployed (the KCI is implied here by requiring “appropriate risk mitigation measures”). However, there is no justification that the KRI and KCI thresholds given are sufficient to keep residual risk below the risk tolerance.

Quotes:

“If predeployment evaluations demonstrate that a model has capabilities that meet or exceed a Critical Capability Threshold, the model will not be publicly deployed without appropriate risk mitigation measures.” (p. 1)

2.2.4 Policy to put development on hold if the required KCI threshold cannot be achieved, until sufficient controls are implemented to meet the threshold (20%) 25%

The framework mentions multiple times that models will not be deployed if the implied required KCI threshold cannot be achieved. However, they do not commit to putting development on hold, and it is unclear if “deployment” excludes internal deployments, as some of the quotes mention only preventing public deployment.

Quotes:

“At its core, this Framework reflects our commitment that we will not deploy frontier AI models developed by Amazon that exceed specified risk thresholds without appropriate safeguards in place.” (p. 1)

“When a maximal capability evaluation indicates that a model has hit a Critical Capability Threshold, we will not deploy the model until we have implemented appropriate safeguards.” (p. 3)

“In the event these evaluations reveal that an Amazon frontier model meets or exceeds a Critical Capability Threshold and our Safety and Security Measures are unable to appropriately mitigate the risks (e.g., by preventing reliable elicitation of the capability by malicious actors), we will not deploy the model until we have identified and implemented appropriate additional safeguards.” (p. 3)

Amazon

3. Risk Treatment

Weak 23%

3.1 Implementing Mitigation Measures (50%) 38%

3.1.1 Containment measures (35%) 74%

3.1.1.1 Containment measures are precisely defined for all KCI thresholds (60%) 90%

There is substantial detail about containment measures, that is precise and comprehensive, showing nuance. While it is not explicitly tied to the KCI threshold, it is assumed that all these measures are implemented for all current models, as well as those crossing critical capability thresholds. However, more detail should be given on how containment measures differ for critical models.

Quotes:

“At Amazon, security is job zero. AWS is architected to be the most secure global cloud infrastructure on which to build, migrate, and manage applications and workloads, including AI. This is backed by the trust of our millions of customers, including the most security sensitive organizations like government, healthcare, and financial services. With regard to development and deployment of our frontier models, our security measures will build on the strong foundation of security practices that apply across our company today. We describe our current practices in greater detail in Appendix A. Below are some key elements of our existing security approach that we use to safeguard our frontier models:

Secure compute and networking environments. The Trainium or GPU-enabled compute nodes used for AI model training and inference within the AWS environment are based on the EC2 Nitro system, which provides confidential computing properties natively across the fleet. Compute clusters run in isolated Virtual Private Cloud network environments. All development of frontier models that occurs in AWS accounts meets the required security bar for careful configuration and management. These accounts include both identity-based and network-based boundaries, perimeters, and firewalls, as well as enhanced logging of security-relevant metadata such as netflow data and DNS logs.
Advanced data protection capabilities. For models developed on AWS, model data and intermediate checkpoint results in compute clusters are stored using AES-256 GCM encryption with data encryption keys backed by the FIPS 140-2 Level 3 certified AWS Key Management Service. Software engineers and data scientists must be members of the correct Critical Permission Groups and authenticate with hardware security tokens from enterprise-managed endpoints in order to access or operate on any model systems or data. Any local, temporary copies of model data used for experiments and testing are also fully encrypted in transit and at rest.
Security monitoring, operations, and response. Amazon’s automated threat intelligence and defense systems detect and mitigate millions of threats each day. These systems are backed by human experts for threat intelligence, security operations, and security response. Threat sharing with other providers and government agencies provides collective defense and response.” (p. 3)

Many more containment measures are listed in Appendix A, filling nearly three pages. For instance,
“Secure AI infrastructure and development environment. All AI accelerator or GPU-enabled compute nodes used for AI model training and inference within the AWS environment are based on the EC2 Nitro system, which provides confidential computing properties natively across the fleet. Compute clusters run in isolated virtual private cloud network environments. All model data and intermediate checkpoint results are stored using AES-256 GCM encryption with data encryption keys backed by KMS. All development of frontier models occurs in AWS accounts that meet the required security bar for careful configuration and management. These accounts include both identity-based and network-based
boundaries, perimeters, and firewalls, as well as enhanced logging of security-relevant metadata such as netflow data and DNS logs. The AWS GuardDuty intrusion detection service is enabled, providing automatic monitoring for potential security threats, searching for indicators of compromise, and surfacing high priority alerts as appropriate. Software engineers and data scientist must be members of the correct Critical Permission Groups and authenticate with hardware security tokens from enterprise-managed endpoints in order to access or operate on any model systems or data. Any local, temporary copies of model data used for experiments and testing are also fully encrypted in transit and at rest at all times.” (p. 8)

3.1.1.2 Proof that containment measures are sufficient to meet the thresholds (40%) 50%

There exist structured internal processes for determining that containment measures are reviewed and tested for sufficiency. However, this is not tied directly to the KCI threshold they give of “preventing reliable elicitation of [critical capabilities] by malicious actors”. More detail could also be given on how they “evaluate models […] to ensure that they adequately mitigate the risks associated with the Critical Capability Threshold.” Importantly, they do not give proof for why they believe their containment measures to be sufficient for this containment KCI threshold – however, their “use of formal methods to ensure correctness of security-critical components and subsystems” lends itself easily to providing this type of evidence; partial credit is given.

Quotes:

“We will evaluate models following the application of these [safety and security] safeguards to ensure that they adequately mitigate the risks associated with the Critical Capability Threshold.” (p. 3)

“Secure design, security reviews, and security testing. […] At the same time, central security teams provide enhanced capabilities and expertise that all engineering teams rely on, including through security architecture reviews, threat modeling exercises, assessments to ensure compliance with all corporate security policies and practices, penetration testing, red teaming services, and the operation of bug bounty programs to enlist the help of outside experts. In the end, all software and AI projects at Amazon must undergo and pass a full security and safety review by one of the central security teams.” (p. 7)

“Use of formal methods to ensure correctness of security-critical components and subsystems. Amazon makes wide usage of the area of computer science known as automating reasoning (AR), a branch of artificial intelligence that utilizes math and logic to prove the correctness of key software systems. Critical security components such as encryption algorithms, authorization systems, automatic privilege reduction features, and network security components and libraries, are developed by first creating ideal models of software systems and all their desired states, and then mathematically proving that the accompanying software implementation satisfies all the properties of the model. These proofs are incorporated into the software development lifecycle such that all changes or additions to these critical code bases have the proofs run against them automatically, and any code update that fails to pass a proof is rejected. AWS also applies AR to GenAI itself in order to help manage the problem of hallucinations” (p. 8)

3.1.1.3 Strong third party verification process to verify that the containment measures meet the threshold (100% if 3.1.1.3 > [60% x 3.1.1.1 + 40% x 3.1.1.2]) 0%

There is no detail of third-party verification that containment measures meet the KCI threshold.

Quotes:

No relevant quotes found.

3.1.2 Deployment measures (35%) 25%

3.1.2.1 Deployment measures are precisely defined for all KCI thresholds (60%) 25%

Whilst they define deployment measures in general, these are not tied to KCI thresholds nor specific risk domains. For instance, the deployment measures for models that cross the Critical Capability Threshold in Offensive Cyberoperations may be different to deployment measures for models that cross the Critical Capability Threshold in Automated AI R&D.

Quotes:

“Examples of current safety mitigations include:

Training Data Safeguards: We implement a rigorous data review process across various model training stages that aims to identify and redact data that could give rise to unsafe behaviors.
Alignment Training: We implement automated methods to ensure we meet the design objectives for each of Amazon’s responsible AI dimensions, including safety and security. Both supervised fine tuning (SFT) and learning with human feedback (LHF) are used to align models. Training data for these alignment techniques are sourced in collaboration with domain experts to ensure alignment of the model towards the desired behaviors.
Harmful Content Guardrails: Application of runtime input and output moderation systems serve as a first and last line of defense and enable rapid response to newly identified threats or gaps in model alignment. Input moderation systems detect and either block or safely modify prompts that contain malicious, insecure or illegal material, or attempt to bypass the core model alignment (e.g. prompt injection, jail-breaking). Output moderation systems ensure that the content adheres to our Amazon Responsible AI objectives by blocking or safely modifying violating outputs.
Fine-tuning Safeguards: Models are trained in a manner that makes them resilient to malicious customer fine-tuning efforts that could undermine initial Responsible AI alignment training by the Amazon team.
Incident Response Protocols: Incident escalation and response pathways enable rapid remediation of reported AI safety incidents, including jailbreak remediation.” (p. 3)

3.1.2.2 Proof that deployment measures are sufficient to meet the thresholds (40%) 25%

They mention that models will be evaluated to “ensure that they adequately mitigate the risks associated with Critical Capability Thresholds”. Similarly, they describe engaging in a “safeguards evaluation” to “assess the adequacy of the risk mitigation measures that are applied to a model.” However, detail on how this evaluation is conducted is not given, nor the criteria for determining whether mitigation measures are sufficient. Further, proof should be provided ex ante for why they believe their deployment measures will meet the relevant KCI threshold.

Quotes:

“Our evaluation process includes “maximal capability evaluations” to determine the outer bounds of our models’ Critical Capabilities and a subsequent “safeguards evaluation” to assess the adequacy of the risk mitigation measures that are applied to a model.” (p. 3)

“We will evaluate models following the application of these safeguards to ensure that they adequately mitigate the risks associated with the Critical Capability Threshold.” (p. 3)

3.1.2.3 Strong third party verification process to verify that the deployment measures meet the threshold (100% if 3.1.2.3 > [60% x 3.1.2.1 + 40% x 3.1.2.2]) 0%

There is no detail of third-party verification that deployment measures meet the KCI threshold.

Quotes:

No relevant quotes found.

3.1.3 Assurance processes (30%) 10%

3.1.3.1 Credible plans towards the development of assurance properties (40%) 25%

There is a commitment to collaborating with academics to advance AI safety R&D, which likely entails research aimed at developing assurance processes: “these channels enable us to […] discover promising approaches towards aligning our frontier models.”

However, they do not address: (a) at what KRI the assurance processes become necessary, and (b) justification for why they believe they will have sufficient assurance processes by the time the relevant KRI is reached, including (c) technical milestones and estimates of when these milestones will need to be reached given forecasted capabilities growth.

Quotes:

“Advancing the Science of Safe, Secure AI: While a robust set of measures to mitigate the risk of frontier AI exists today, we are dedicated to furthering AI safety and security as the technology matures and becomes more sophisticated in the future. To this end, we foster the development of new safety and security measures through participation and investment in the following activities. Efforts to develop further safety measures include: […]
Fostering academic research for development of cutting-edge alignment techniques: Through initiatives such as the Amazon Research Awards and Amazon Research centers (e.g. USC + Amazon Center on Secure & Trusted Machine Learning, Amazon/ MIT Science Hub), we work with leading academic partners conducting research on frontier AI risks and novel risk mitigation approaches. Additionally, we advance our own research and publish findings in safety conferences, while borrowing learnings presented by other academic institutions at similar venues.

Investments in advanced AI safety R&D: At Amazon, we accelerate our work in AI safety through initiatives such as our Amazon AGI SF Lab and the Trusted AI Challenge. These channels enable us to leverage the work of subject matter experts and discover promising approaches towards aligning our frontier models.” (p. 4)

3.1.3.2 Evidence that the assurance properties are enough to achieve their corresponding KCI thresholds (40%) 0%

There is no mention of providing evidence that the assurance processes are sufficient.

Quotes:

No relevant quotes found.

3.1.3.3 The underlying assumptions that are essential for their effective implementation and success are clearly outlined (20%) 10%

There is no mention of assumptions essential for effective implementation of assurance process measures. There is some mention of assurance process measures: “Alignment training: […] Both supervised fine tuning (SFT) and learning with human feedback (LHF) are used to align models.” But the underlying assumptions essential for effective implementation (i.e., alignment training successfully aligning the model) are not given. There is some awareness that assurance (i.e., an argumentation with assumptions laid out) about mitigations is necessary: “Amazon’s senior leadership will review the plan for applying risk mitigations to address the Critical Capability, how we measure and have assurance about those mitigations, and approve the mitigations prior to deployment.” Partial credit is given.

Quotes:

“Alignment training: We implement automated methods to ensure we meet the design objectives for each of Amazon’s responsible AI dimensions, including safety and security. Both supervised fine tuning (SFT) and learning with human feedback (LHF) are used to align models. Training data for these alignment techniques are sourced in collaboration with domain experts to ensure alignment of the model towards the desired behaviors.” (p. 3)

3.2 Continuous Monitoring and Comparing Results with Pre-determined Thresholds (50%) 8%

3.2.1 Monitoring of KRIs (40%) 13%

3.2.1.1 Justification that elicitation methods used during the evaluations are comprehensive enough to match the elicitation efforts of potential threat actors (30%) 25%

The framework describes determining the ‘outer bounds’ of capabilities, but does not provide detail as to (a) how this done, or (b) why this methodology is comprehensive enough. There also does not seem to be an awareness that elicitation efforts should match those of potential threat actors.

Quotes:

“Our evaluation process includes “maximal capability evaluations” to determine the outer bounds of our models’ Critical Capabilities” (p. 3)

3.2.1.2 Evaluation frequency (25%) 0%

There is a commitment to conduct evaluations on an “ongoing basis”, and to “re-evaluate deployed models prior to any major updates that could meaningfully enhance underlying capabilities.” However, the specifics on this frequency is not given. To improve, frequency should be determined in terms of both a fixed time period, and the relative variation of effective compute used in training, to give structure and allow for unexpected emergent behaviours or post-training enhancements.

Quotes:

“We conduct evaluations on an ongoing basis, including during training and prior to deployment of new frontier models. We will re-evaluate deployed models prior to any major updates that could meaningfully enhance underlying capabilities.” (p. 3)

3.2.1.3 Description of how post-training enhancements are factored into capability assessments (15%) 0%

There is no description of factoring in post-training enhancements into capability assessments. To improve, a process should be described which takes into account post-training enhancements via implementing and monitoring a safety margin or implementing the latest post-training enhancements to upper bound elicitation with some confidence.

Quotes:

No relevant quotes found.

3.2.1.4 Vetting of protocols by third parties (15%) 10%

There is some description of vetting automated benchmarks with experts (though these may not necessarily be external), by building the evaluation methodologies “in collaboration with experts.” To improve, the framework should describe some process for having third parties review the process for determining of KRI status.

Quotes:

“We conduct comprehensive evaluations to assess our frontier models using state-of-the-art public benchmarks in addition to internal benchmarking on proprietary test sets built in collaboration with experts.” (p. 3)

3.2.1.5 Replication of evaluations by third parties (15%) 25%

There is a commitment to external red-teaming, but not to having evaluations such as automated benchmarks or uplift studies conducted/audited by third parties.

Quotes:

“Expert Red Teaming: Red teaming vendors and in-house red teaming experts test our models for safety and security. We work with specialized firms and academics to red-team our models to evaluate them for risks that require domain specific expertise.” (p. 3)

3.2.2 Monitoring of KCIs (40%) 0%

3.2.2.1 Detailed description of evaluation methodology and justification that KCI thresholds will not be crossed unnoticed (40%) 0%

There is no mention of monitoring mitigation effectiveness after safeguards assessment. There are incident response protocols, but these do not mention reviewing mitigations, only remediation of incidents.

Quotes:

“Incident Response Protocols: Incident escalation and response pathways enable rapid remediation of reported AI safety incidents, including jailbreak remediation.” (p. 4)

“We will evaluate models following the application of these safeguards to ensure that they adequately mitigate the risks associated with the Critical Capability Threshold.” (p. 4)

“Frontier models developed by Amazon will be subject to maximal capability evaluations and safeguards evaluations prior to deployment.” (p. 5)

3.2.2.2 Vetting of protocols by third parties (30%) 0%

There is no mention of KCIs protocols being vetted by third parties.

Quotes:

No relevant quotes found.

3.2.2.3 Replication of evaluations by third parties (30%) 0%

There is no mention of control evaluations/mitigation testing being replicated or conducted by third-parties.

Quotes:

No relevant quotes found.

3.2.3 Transparency of evaluation results (10%) 21%

3.2.3.1 Sharing of evaluation results with relevant stakeholders as appropriate (85%) 25%

There is a commitment to publishing “information about” evaluations, and this is implicitly publicly – however, this is not the same as publishing all KCI and KRI assessments publicly. There is also a mention of information sharing of “findings related to our models” with other AI companies. To improve, the framework should detail a process for notifying authorities if KRI thresholds are crossed, and publish KCI evaluations as well as KRI evaluations.

Quotes:

“Amazon will publish, in connection with the launch of a frontier AI model launch (in model documentation, such as model service cards), information about the frontier model evaluation for safety and security.” (p. 5)

“Information sharing and best practices development: Engagement in fora that bring together companies developing frontier models (e.g. Frontier Model Forum and Partnership on AI) and organized by government agencies (e.g. National Institute of Standards and Technologies). These platforms serve as an opportunity to share findings related to our models and to adopt recommendations from other leading companies.” (p. 4)

3.2.3.2 Commitment to non-interference with findings (15%) 0%

No commitment to permitting the reports, which detail the results of external evaluations (i.e. any KRI or KCI assessments conducted by third parties), to be written independently and without interference or suppression.

Quotes:

No relevant quotes found.

3.2.4 Monitoring for novel risks (10%) 5%

3.2.4.1 Identifying novel risks post-deployment: engages in some process (post deployment) explicitly for identifying novel risk domains or novel risk models within known risk domains (50%) 10%

Whilst there is a focus on security monitoring, there is no process defined for identifying novel risks or risk profiles. They do mention collaborating on threat modeling to update their critical capability thresholds for “evolving (and potentially new) threats”. To improve, a rigorous process for identifying such threats should be detailed, along with justification for why they believe this is likely to identify novel threats.

Quotes:

3.2.4.2 Mechanism to incorporate novel risks identified post-deployment (50%) 0%

Whilst the framework mentions “collaboration on threat modeling” and “learning from our red teaming network”, to improve they should define a process for incorporating novel risks into their risk models when they arise.

Quotes:

Amazon

4. Risk Governance

Weak 22%

4.1 Decision-making (25%) 34%

4.1.1 The company has clearly defined risk owners for every key risk identified and tracked (25%) 25%

While the framework does not delineate risk owners exactly, it lists a number of decision-making stakeholders.

Quotes:

4.1.2 The company has a dedicated risk committee at the management level that meets regularly (25%) 10%

The framework does not mention a specific committee, but mentions leadership review.

Quotes:

4.1.3 The company has defined protocols for how to make go/no-go decisions (25%) 75%

The framework outlines clear decision-making protocols, including the basis for decisions and the decision makers.

Quotes:

“Amazon’s senior leadership will review the plan for applying risk mitigations to address the Critical Capability, how we measure and have assurance about those mitigations, and approve the mitigations prior to deployment.” (p. 5)
“Frontier models developed by Amazon will be subject to maximal capability evaluations and safeguards evaluations prior to deployment. The results of these evaluations will be reviewed during launch processes. Models may not be publicly released unless safeguards are applied.” (p. 5)
“Amazon’s senior leadership will likewise review the safeguards evaluation report as part of a go/no-go decision.” (p. 5)
“In the event these evaluations reveal that an Amazon frontier model meets or exceeds a Critical Capability Threshold and our Safety and Security Measures are unable to appropriately mitigate the risks (e.g., by preventing reliable elicitation of the capability by malicious actors), we will not deploy the model until we have identified and implemented appropriate additional safeguards.” (p. 3)

4.1.4 The company has defined escalation procedures in case of incidents (25%) 25%

The framework mentions the existence of incident escalation protocols.

Quotes:

“Incident Response Protocols: Incident escalation and response pathways enable rapid remediation of reported AI safety incidents, including jailbreak remediation.” (p. 4)

4.2. Advisory and Challenge (20%) 14%

4.2.1 The company has an executive risk officer with sufficient resources (16.7%) 0%

No mention of an executive risk officer.

Quotes:

No relevant quotes found.

4.2.2 The company has a committee advising management on decisions involving risk (16.7%) 0%

No mention of an advisory committee.

Quotes:

No relevant quotes found.

4.2.3 The company has an established system for tracking and monitoring risks (16.7%) 25%

The framework outlines some measures of tracking risk.

Quotes:

“We will use a range of methods to evaluate frontier models for capabilities that are as closely correlated to the Critical Capability Thresholds as possible. In most cases a single evaluation will not be sufficient for an informed determination as to whether a model has hit a Critical Capability Threshold.” (p. 3)

“Amazon’s threat intelligence, Trust & Safety, and insider threat teams are building additional capabilities to track advanced threat actors and how they interact with and attempt to subvert security measures surrounding AI models.” (p. 5)

4.2.4 The company has designated people that can advise and challenge management on decisions involving risk (16.7%) 10%

There is no clear mention of advisory and challenge, but reviews from several involved stakeholders are listed.

Quotes:

“Frontier models developed by Amazon will be subject to maximal capability evaluations and safeguards evaluations prior to deployment. The results of these evaluations will be reviewed during launch processes. Models may not be publicly released unless safeguards are applied. The team performing the Critical Capability Threshold evaluations will report to Amazon senior leadership any evaluation that exceeds the Critical Capability Threshold. The report will be directed to the SVP for the model development team, the Chief Security Officer, and legal counsel. Amazon’s senior leadership will review the plan for applying risk mitigations to address the Critical Capability, how we measure and have assurance about those mitigations, and approve the mitigations prior to deployment. Amazon’s senior leadership will likewise review the safeguards evaluation report as part of a go/no-go decision. (p. 5)

4.2.5 The company has an established system for aggregating risk data and reporting on risk to senior management and the Board (16.7%) 50%

The framework clearly states how risk will be reported to senior management.

Quotes:

“The team performing the Critical Capability Threshold evaluations will report to Amazon senior leadership any evaluation that exceeds the Critical Capability Threshold. The report will be directed to the SVP for the model development team, the Chief Security Officer, and legal counsel.” (p. 5)
“Amazon’s senior leadership will review the plan for applying risk mitigations to address the Critical Capability, how we measure and have assurance about those mitigations, and approve the mitigations prior to deployment.” (p. 5)

4.2.6 The company has an established central risk function (16.7%) 0%

No mention of a central risk function.

Quotes:

No relevant quotes found.

4.3 Audit (20%) 25%

4.3.1 The company has an internal audit function involved in AI governance (50%) 0%

No mention of an internal audit function.

Quotes:

No relevant quotes found.

4.3.2 The company involves external auditors (50%) 50%

The framework includes external red teams, but does not specify if they will have auditor independence.

Quotes:

“We work with specialized firms and academics to red-team our models to evaluate them for risks that require domain specific expertise.” (p. 3)
“Red teaming vendors and in-house red teaming experts test our models for safety and security.” (p. 3)

4.4 Oversight (20%) 0%

4.4.1 The Board of Directors of the company has a committee that provides oversight over all decisions involving risk (50%) 0%

No mention of a Board risk committee.

Quotes:

No relevant quotes found.

4.4.2 The company has other governing bodies outside of the Board of Directors that provide oversight over decisions (50%) 0%

No mention of any additional governance bodies.

Quotes:

No relevant quotes found.

4.5 Culture (10%) 20%

4.5.1 The company has a strong tone from the top (33.3%) 50%

The framework includes a commitment to mitigate risk.

Quotes:

“As we continue to scale the capabilities of Amazon’s frontier models and democratize access to the benefits of AI, we also take responsibility for mitigating the risks of our technology. Consistent with Amazon’s endorsement of the Korea Frontier AI Safety Commitments, this Framework outlines the protocols we will follow to ensure that frontier models developed by Amazon do not expose critical capabilities that have the potential to create severe risks. At its core, this Framework reflects our commitment that we will not deploy frontier AI models developed by Amazon that exceed specified risk thresholds without appropriate safeguards in place.” (p. 1)

4.5.2 The company has a strong risk culture (33.3%) 0%

No mention of elements of risk culture.

Quotes:

No relevant quotes found.

4.5.3 The company has a strong speak-up culture (33.3%) 0%

No mention of elements of speak-up culture.

Quotes:

No relevant quotes found.

4.6 Transparency (5%) 67%

4.6.1 The company reports externally on what their risks are (33.3%) 50%

The framework clearly lists the risks in scope and a commitment to model documentation.

Quotes:

“Chemical, Biological, Radiological, and Nuclear (CBRN) Weapons Proliferation…Offensive Cyber Operations…Automated AI R&D” (p. 2)
“Amazon will publish, in connection with the launch of a frontier AI model launch (in model documentation, such as model service cards), information about the frontier model evaluation for safety and security.” (p. 5)

4.6.2 The company reports externally on what their governance structure looks like (33.3%) 75%

The framework includes significant detail on governance mechanisms.

Quotes:

Internally, we will use this framework to guide our model development and launch decisions. The implementation of the framework will require: The Frontier Model Safety Framework will be incorporated into the Amazon-wide Responsible AI Governance Program, enabling Amazon-wide visibility into the expectations, mechanisms, and adherence to the Framework. Frontier models developed by Amazon will be subject to maximal capability evaluations and safeguards evaluations prior to deployment. The results of these evaluations will be reviewed during launch processes. Models may not be publicly released unless safeguards are applied. The team performing the Critical Capability Threshold evaluations will report to Amazon senior leadership any evaluation that exceeds the Critical Capability Threshold. The report will be directed to the SVP for the model development team, the Chief Security Officer, and legal counsel. Amazon’s senior leadership will review the plan for applying risk mitigations to address the Critical Capability, how we measure and have assurance about those mitigations, and approve the mitigations prior to deployment. Amazon’s senior leadership will likewise review the safeguards evaluation report as part of a go/no-go decision…As we advance our work on frontier models, we will also continue to enhance our AI safety evaluation and risk management processes. This evolving body of work requires an evolving framework as well. We will therefore revisit this Framework at least annually and update it as necessary to ensure that our protocols are appropriately robust to uphold our commitment to deploy safe and secure models. We will also update this Framework as needed in connection with significant technological developments.” (p. 5)

4.6.3 The company shares information with industry peers and government bodies (33.3%) 75%

The framework mentions information sharing with a wide range of other entities.

Quotes:

“Threat sharing with other providers and government agencies provides collective defense and response.” (p. 4)
“Collaboration on threat modeling and updated Critical Capability Thresholds: Amazon is committed to partnering with governments, domain experts, and industry peers to continuously improve Amazon’s awareness of the threat environment and ensure that our Critical Capability Thresholds and evaluation processes account for evolving (and potentially new) threats.” (p. 4)
“Amazon will utilize relevant industry bodies such as the Frontier Model Forum to share threat patterns and indicators, as well as responses and mitigations where appropriate, to enable better collective defense will against adversaries seeking to undermine frontier model security.” (p. 5)
“Information sharing and best practices development: Engagement in fora that bring together companies developing frontier models (e.g. Frontier Model Forum and Partnership on AI) and organized by government agencies (e.g. National Institute of Standards and Technologies).” (p. 4)
“These platforms serve as an opportunity to share findings related to our models and to adopt recommendations from other leading companies.” (p. 4)

Amazon

Best in class

Overview

Stronger transparency, such as sharing information with industry peers and government. Defined protocols for making go/no-go decisions. Clear process for how risk is reported to senior management. Some awareness of open-ended red teaming.

No justification for excluding some risks, such as persuasion or loss of control risks, and including others. No risk tolerance given beyond preventing "severe public safety risks". Each risk domain has only one risk threshold, without justification for why one sufficient. Lack of risk modelling.

1. Risk Identification

1.1 Classification of Applicable Known Risks (40%) 13%

1.1.1 Risks from literature and taxonomies are well covered (50%) 25%

Quotes:

1.1.2 Exclusions are clearly justified and documented (50%) 0%

Quotes:

1.2 Identification of Unknown Risks (Open-ended red teaming) (20%) 10%

1.2.1 Internal open-ended red teaming (70%) 10%

Quotes:

1.2.2 Third party open-ended red teaming (30%) 10%

Quotes:

1.3 Risk modeling (40%) 11%

1.3.1 The company uses risk models for all the risk domains identified and the risk models are published (with potentially dangerous information redacted) (40%) 10%

Quotes:

1.3.2 Risk modeling methodology (40%) 4%

1.3.2.1 Methodology precisely defined (70%) 0%

Quotes:

1.3.2.2 Mechanism to incorporate red teaming findings (15%) 0%

Quotes:

1.3.2.3 Prioritization of severe and probable risks (15%) 25%

Quotes:

1.3.3 Third party validation of risk models (20%) 25%

Quotes:

2. Risk Analysis and Evaluation

2.1 Setting a Risk Tolerance (35%) 7%

2.1.1 Risk tolerance is defined (80%) 8%

2.1.1.1 Risk tolerance is at least qualitatively defined for all risks (33%) 25%

Quotes:

2.1.1.2 Risk tolerance is expressed at least partly quantitatively as a combination of scenarios (qualitative) and probabilities (quantitative) for all risks (33%) 0%

Quotes:

2.1.1.3 Risk tolerance is expressed fully quantitatively as a product of severity (quantitative) and probability (quantitative) for all risks (33%) 0%

Quotes:

2.1.2 Process to define the tolerance (20%) 0%

2.1.2.1 AI developers engage in public consultations or seek guidance from regulators where available (50%) 0%

Quotes:

2.1.2.2 Any significant deviations from risk tolerance norms established in other industries is justified and documented (e.g., cost-benefit analyses) (50%) 0%

Quotes:

2.2 Operationalizing Risk Tolerance (65%) 22%

2.2.1 Key Risk Indicators (KRI) (30%) 21%

2.2.1.1 KRI thresholds are at least qualitatively defined for all risks (45%) 25%

Quotes:

2.2.1.2 KRI thresholds are quantitatively defined for all risks (45%) 25%

Quotes:

2.2.1.3 KRIs also identify and monitor changes in the level of risk in the external environment (10%) 0%

Quotes:

2.2.2 Key Control Indicators (KCI) (30%) 18%

2.2.2.1 Containment KCIs (35%) 25%

2.2.2.1.1 All KRI thresholds have corresponding qualitative containment KCI thresholds (50%) 50%

Quotes:

2.2.2.1.2 All KRI thresholds have corresponding quantitative containment KCI thresholds (50%) 0%

Quotes:

2.2.2.2 Deployment KCIs (35%) 25%

2.2.2.2.1 All KRI thresholds have corresponding qualitative deployment KCI thresholds (50%) 50%

Quotes:

2.2.2.2.2 All KRI thresholds have corresponding quantitative deployment KCI thresholds (50%) 0%

Quotes:

2.2.2.3 For advanced KRIs, assurance process KCIs are defined (30%) 0%

Quotes:

2.2.3 Pairs of thresholds are grounded in risk modeling to show that risks remain below the tolerance (20%) 25%

Quotes:

2.2.4 Policy to put development on hold if the required KCI threshold cannot be achieved, until sufficient controls are implemented to meet the threshold (20%) 25%

Quotes:

3. Risk Treatment

3.1 Implementing Mitigation Measures (50%) 38%

3.1.1 Containment measures (35%) 74%

3.1.1.1 Containment measures are precisely defined for all KCI thresholds (60%) 90%

Quotes:

3.1.1.2 Proof that containment measures are sufficient to meet the thresholds (40%) 50%

Quotes:

3.1.1.3 Strong third party verification process to verify that the containment measures meet the threshold (100% if 3.1.1.3 > [60% x 3.1.1.1 + 40% x 3.1.1.2]) 0%

Quotes:

3.1.2 Deployment measures (35%) 25%

3.1.2.1 Deployment measures are precisely defined for all KCI thresholds (60%) 25%

Quotes:

3.1.2.2 Proof that deployment measures are sufficient to meet the thresholds (40%) 25%

Stronger transparency, such as sharing information with industry peers and government.

Defined protocols for making go/no-go decisions.

Clear process for how risk is reported to senior management.

Some awareness of open-ended red teaming.

1.3.1 The company uses risk models for all the risk domains identified and the risk models are published (with potentially dangerous information redacted) (40%) 10%