2.2.2.1.1 All KRI thresholds have corresponding qualitative containment KCI thresholds (50%) 75%
For each of the misuse KRIs, they have qualitative containment KCI thresholds related to the RAND security levels, though with a vague qualifier: “at a level generally aligned with RAND SL 2.” It is especially good that some reasoning behind each containment measure is given. However, containment KCIs need to also be defined for the Deceptive Alignment KRIs.
Quotes:
From Table 1, p. 5:
CBRN, Uplift Level 1: “Security controls and detections at a level generally aligned with RAND SL 2. The potential magnitude of harm these capabilities may enable means the exfiltration and leak of model weights reaching this CCL could be highly damaging. However, low-resourced actors are unlikely to pose a substantial exfiltration threat.”
Cyber, Autonomy Level 1: “Security controls and detections at a level generally aligned with RAND SL 2. Harmful cyberattacks against organizations with limited security posture can already be carried out by individuals with limited expertise, but the automation of such attacks would significantly lower the costs of doing so. Exfiltration of model weights could enable the execution of such attacks at scale. However, cybersecurity may improve correspondingly when models reach such capability levels. The relatively ambiguous net costs of exfiltration count against security levels with higher costs to innovation.”
Cyber, Uplift level 1: “Security controls and detections at a level generally aligned with RAND SL 2. A model at this capability level could help fairly well-resourced threat actors carry out severe cyber attacks on targets like critical businesses, national government entities, and critical national infrastructure with lower resource expenditure, potentially increasing the frequency of such attacks significantly. However, as above, cyber defense may improve to diminish the impact of AI-assisted cyber attacks. Similarly, the ambiguous net costs of exfiltration count against security levels with higher costs to innovation.”
Machine Learning R&D, Uplift Level 1: “Security controls and detections at a level generally aligned with RAND SL 3. Unrestricted access to models at this level of capability could significantly increase an actor’s ability to progress to yet more powerful models and other critical capabilities. The exfiltration of such a model may therefore have a significant effect on society’s ability to adapt to and govern powerful AI systems, effects that may have long-lasting consequences. Substantially strengthened security is therefore recommended.”
Machine Learning R&D, Autonomy Level 1: “We recommend security controls and detections at a level generally aligned with RAND SL 4 for this capability threshold, but emphasize that this must be taken on by the frontier AI field as a whole.
Unrestricted access to models at this level of capability could give an actor with adequate computational resources the ability to reach capabilities much more powerful than those in the other CCLs listed in a short amount of time. This could be catastrophic if there is no effective way of defending against rapidly improving and potentially superhuman AI systems wielded by threat actors. Therefore, we recommend models at this level of capability have exceptional security even though they may have substantial innovation costs.”
2.2.2.1.2 All KRI thresholds have corresponding quantitative containment KCI thresholds (50%) 50%
For each of the misuse KRIs, they reference the RAND security levels as the relevant containment KCI, though with a vague qualifier: “at a level generally aligned with RAND SL 2”.
These RAND levels count somewhat as quantitative containment KCIs, but would need to be coupled with probabilities to be fully quantitative. For instance, the RAND levels state criteria such as: ‘A system that can likely thwart most professional opportunistic efforts by attackers that execute moderate effort or non targeted attacks (OC2).’, and the actor is defined quantitatively (“Operations roughly less capable than or comparable to a single individual who is broadly capable in information security spending several weeks with a total budget of up to $10,000 on the specific operation, with preexisting personal cyber infrastructure but no preexisting access to the organization”), but ‘likely’ could be defined quantitatively as probability.
It is especially good that some reasoning behind each containment measure is given. However, this needs to also be defined for the Deceptive Alignment KRIs.
Quotes:
From Table 1, p. 5:
CBRN, Uplift Level 1: “Security controls and detections at a level generally aligned with RAND SL 2. The potential magnitude of harm these capabilities may enable means the exfiltration and leak of model weights reaching this CCL could be highly damaging. However, low-resourced actors are unlikely to pose a substantial exfiltration threat.”
Cyber, Autonomy Level 1: “Security controls and detections at a level generally aligned with RAND SL 2. Harmful cyberattacks against organizations with limited security posture can already be carried out by individuals with limited expertise, but the automation of such attacks would significantly lower the costs of doing so. Exfiltration of model weights could enable the execution of such attacks at scale. However, cybersecurity may improve correspondingly when models reach such capability levels. The relatively ambiguous net costs of exfiltration count against security levels with higher costs to innovation.”
Cyber, Uplift Level 1: “Security controls and detections at a level generally aligned with RAND SL 2. A model at this capability level could help fairly well-resourced threat actors carry out severe cyber attacks on targets like critical businesses, national government entities, and critical national infrastructure with lower resource expenditure, potentially increasing the frequency of such attacks significantly. However, as above, cyber defense may improve to diminish the impact of AI-assisted cyber attacks. Similarly, the ambiguous net costs of exfiltration count against security levels with higher costs to innovation.”
Machine Learning R&D, Uplift Level 1: “Security controls and detections at a level generally aligned with RAND SL 3. Unrestricted access to models at this level of capability could significantly increase an actor’s ability to progress to yet more powerful models and other critical capabilities. The exfiltration of such a model may therefore have a significant effect on society’s ability to adapt to and govern powerful AI systems, effects that may have long-lasting consequences. Substantially strengthened security is therefore recommended.”
Machine Learning R&D, autonomy level 1: “We recommend security controls and detections at a level generally aligned with RAND SL 4 for this capability threshold, but emphasize that this must be taken on by the frontier AI field as a whole. Unrestricted access to models at this level of capability could give an actor with adequate computational resources the ability to reach capabilities much more powerful than those in the other CCLs listed in a short amount of time. This could be catastrophic if there is no effective way of defending against rapidly improving and potentially superhuman AI systems wielded by threat actors. Therefore, we recommend models at this level of capability have exceptional security even though they may have substantial innovation costs.”