Comparison table

presenting our findings

Our methodology allows for extensive, apples-to-apples comparisons between companies on over 65 different risk management criteria.

The table below shows both overall ratings, and ratings for each criterion. We also present the best in class for each criterion, to encourage other companies to adopt these industry-leading risk management practices. 

For more detail on the rationale behind each rating, you can click on the score itself to see a detailed explanation with supporting quotes. To see a company's full breakdown, you can click on the company icon.

For more detail on how scores are calculated, you can click to see our methodology at the button below.

See our methodology

Filter:
  • All
  • Frontier Companies
Click logos to visit company pages
1. Risk Identification
1.1 Classification of Applicable Known Risks (40%)
1.1.1 Risks from literature and taxonomies are well covered (50%)
1.1.2 Exclusions are clearly justified and documented (50%)
1.2 Identification of Unknown Risks (Open-ended red teaming) (20%)
1.2.1 Internal open-ended red teaming (70%)
1.2.2 Third party open-ended red teaming (30%)
1.3 Risk modeling (40%)
1.3.1 The company uses risk models for all the risk domains identified
and the risk models are published (with potentially dangerous
information redacted) (40%)
1.3.2 Risk modeling methodology (40%)
1.3.2.1 Methodology precisely defined (70%)
1.3.2.2 Mechanism to incorporate red teaming findings (15%)
1.3.2.3 Prioritization of severe and probable risks (15%)
1.3.3 Third party validation of risk models (20%)
2. Risk Analysis and Evaluation
2.1 Setting a Risk Tolerance (35%)
2.1.1 Risk tolerance is defined (80%)
2.1.1.1 Risk tolerance is at least qualitatively defined for all risks (33%)
2.1.1.2 Risk tolerance is expressed at least partly quantitatively as a combination of scenarios (qualitative) and probabilities (quantitative) for all risks (33%)
2.1.1.3 Risk tolerance is expressed fully quantitatively as a product of severity (quantitative) and probability (quantitative) for all risks (33%)
2.1.2 Process to define the tolerance (20%)
2.1.2.1 AI developers engage in public consultations or seek guidance from regulators where available (50%)
2.1.2.2 Any significant deviations from risk tolerance norms established in other industries is justified and documented (e.g., cost-benefit analyses) (50%)
2.2 Operationalizing Risk Tolerance (65%)
2.2.1 Key Risk Indicators (KRI) (30%)
2.2.1.1 KRI thresholds are at least qualitatively defined for all risks (45%)
2.2.1.2 KRI thresholds are quantitatively defined for all risks (45%)
2.2.1.3 KRIs also identify and monitor changes in the level of risk in the external environment (10%)
2.2.2 Key Control Indicators (KCI) (30%)
2.2.2.1 Containment KCIs (35%)
2.2.2.1.1 All KRI thresholds have corresponding qualitative containment KCI thresholds (50%)
2.2.2.1.2 All KRI thresholds have corresponding quantitative containment KCI thresholds (50%)
2.2.2.2 Deployment KCIs (35%)
2.2.2.2.1 All KRI thresholds have corresponding qualitative deployment KCI thresholds (50%)
2.2.2.2.2 All KRI thresholds have corresponding quantitative deployment KCI thresholds (50%)
2.2.2.3 For advanced KRIs, assurance process KCIs are defined (30%)
2.2.3 Pairs of thresholds are grounded in risk modeling to show that risks remain below the tolerance (20%)
2.2.4 Policy to put development on hold if the required KCI threshold cannot be achieved, until sufficient controls are implemented to meet the threshold (20%)
3. Risk Treatment
3.1 Implementing Mitigation Measures (50%)
3.1.1 Containment measures (35%)
3.1.1.1 Containment measures are precisely defined for all KCI thresholds (60%)
3.1.1.2 Proof that containment measures are sufficient to meet the thresholds (40%)
3.1.1.3 Strong third party verification process to verify that the containment measures meet the threshold (100% if 3.1.1.3 > [60% x 3.1.1.1 + 40% x 3.1.1.2])
3.1.2 Deployment measures (35%)
3.1.2.1 Deployment measures are precisely defined for all KCI thresholds (60%)
3.1.2.2 Proof that deployment measures are sufficient to meet the thresholds (40%)
3.1.2.3 Strong third party verification process to verify that the deployment measures meet the threshold (100% if 3.1.2.3 > [60% x 3.1.2.1 + 40% x 3.1.2.2])
3.1.3 Assurance processes (30%)
3.1.3.1 Credible plans towards the development of assurance properties (40%)
3.1.3.2 Evidence that the assurance properties are enough to achieve their corresponding KCI thresholds (40%)
3.1.3.3 The underlying assumptions that are essential for their effective implementation and success are clearly outlined (20%)
3.2 Continuous Monitoring and Comparing Results with Pre-determined Thresholds (50%)
3.2.1 Monitoring of KRIs (40%)
3.2.1.1 Justification that elicitation methods used during the evaluations are comprehensive enough to match the elicitation efforts of potential threat actors (30%)
3.2.1.2 Evaluation frequency (25%)
3.2.1.3 Description of how post-training enhancements are factored into capability assessments (15%)
3.2.1.4 Vetting of protocols by third parties (15%)
3.2.1.5 Replication of evaluations by third parties (15%)
3.2.2 Monitoring of KCIs (40%)
3.2.2.1 Detailed description of evaluation methodology and justification that KCI thresholds will not be crossed unnoticed (40%)
3.2.2.2 Vetting of protocols by third parties (30%)
3.2.2.3 Replication of evaluations by third parties (30%)
3.2.3 Transparency of evaluation results (10%)
3.2.3.1 Sharing of evaluation results with relevant stakeholders as appropriate (85%)
3.2.3.2 Commitment to non-interference with findings (15%)
3.2.4 Monitoring for novel risks (10%)
3.2.4.1 Identifying novel risks post-deployment: engages in some process (post deployment) explicitly for identifying novel risk domains or novel risk models within known risk domains (50%)
3.2.4.2 Mechanism to incorporate novel risks identified post-deployment (50%)
4. Risk Governance
4.1 Decision-making (25%)
4.1.1 The company has clearly defined risk owners for every key risk identified and tracked (25%)
4.1.2 The company has a dedicated risk committee at the management level that meets regularly (25%)
4.1.3 The company has defined protocols for how to make go/no-go decisions (25%)
4.1.4 The company has defined escalation procedures in case of incidents (25%)
4.2. Advisory and Challenge (20%)
4.2.1 The company has an executive risk officer with sufficient resources (16.7%)
4.2.2 The company has a committee advising management on decisions involving risk (16.7%)
4.2.3 The company has an established system for tracking and monitoring risks (16.7%)
4.2.4 The company has designated people that can advise and challenge management on decisions involving risk (16.7%)
4.2.5 The company has an established system for aggregating risk data and reporting on risk to senior management and the Board (16.7%)
4.2.6 The company has an established central risk function (16.7%)
4.3 Audit (20%)
4.3.1 The company has an internal audit function involved in AI governance (50%)
4.3.2 The company involves external auditors (50%)
4.4 Oversight (20%)
4.4.1 The Board of Directors of the company has a committee that provides oversight over all decisions involving risk (50%)
4.4.2 The company has other governing bodies outside of the Board of Directors that provide oversight over decisions (50%)
4.5 Culture (10%)
4.5.1 The company has a strong tone from the top (33.3%)
4.5.2 The company has a strong risk culture (33.3%)
4.5.3 The company has a strong speak-up culture (33.3%)
4.6 Transparency (5%)
4.6.1 The company reports externally on what their risks are (33.3%)
4.6.2 The company reports externally on what their governance structure looks like (33.3%)
4.6.3 The company shares information with industry peers and government bodies (33.3%)