2.2.2.2.1 All KRI thresholds have corresponding qualitative deployment KCI thresholds (50%) 50%
The KRI thresholds clearly require DML 2 if triggered. DML 2 is their deployment mitigation level 2. DML 2 is clearly qualitatively defined: “Even a determined actor should not be able to reliably elicit CBRN weapons advice or use the model to automate powerful cyberattacks including malware generation as well as misinformation campaigns, fraud material, illicit video/text/image generation via jailbreak techniques overriding the internal guardrails and supplemental security products.”
More detail would be useful on what constitutes a “determined actor”, “reliably elicit”, or “powerful cyberattacks.” It is also unclear if DML 2 must be implemented even if say, the Biological threats KRI is triggered but the Offensive cybersecurity KRI is not.
DML 1,3 and 4 are also defined, though they are not linked to a specific KRI threshold, and could again use more detail.
There is a commitment to further develop DML3 once the KRI is reached, but without justification that this will be sufficiently in advance.
Quotes:
“Additionally, if a Frontier Capability Threshold has been reached, G42 will update this Framework to define a more advanced threshold that requires increased deployment (e.g., DML 3) and security mitigations (e.g., SML 3).” (p. 5)
“G42’s Frontier Capability Thresholds are defined in the following table. Each capability threshold is associated with a required Deployment Mitigation Level (DML) and Security Mitigation Levels (SML), which must be achieved before the capability threshold is reached. If a necessary Deployment Mitigation Level cannot be achieved, then the model’s deployment must be restricted; if a necessary Security Mitigation Level cannot be achieved, then further capabilities development of the model must be paused. Additionally, if a Frontier Capability Threshold has been reached, G42 will update this Framework to define a more advanced threshold that requires increased deployment (e.g., DML 3) and security mitigations (e.g., SML 3).” (p. 5)
Frontier Capability Threshold for Biological threats and/or Offensive cybersecurity triggers DML 2 and SML 2. (pp. 5-6)
“G42’s Deployment Mitigation Levels are a set of levels, mapped to the Frontier Capability Thresholds, that describe escalating mitigation measures for products deployed externally. These protect against misuse, including through jailbreaking, as models reach higher levels of capability and risk.” (p. 7)
“Deployment Mitigation Level 1: Foundational safeguards, applied to models with minimal hazardous capabilities. Objective: No novel mitigations required on the basis of catastrophically dangerous capabilities” (p. 7)
“Deployment Mitigation Level 2: Intermediate safeguards for models with capabilities requiring focused monitoring. Objective: Even a determined actor should not be able to reliably elicit CBRN weapons advice or use the model to automate
powerful cyberattacks including malware generation as well as misinformation campaigns, fraud material, illicit video/text/image generation via jailbreak techniques overriding the internal guardrails
and supplemental security products.”
“Deployment Mitigation Level 3: Advanced safeguards for models approaching significant capability thresholds. Objective: Deployment safety should be strong enough to resist sophisticated attempts to jailbreak or otherwise misuse the model.”
“Deployment Mitigation Level 4: Maximum safeguards, designed for high-stakes frontier models with critical functions. Objective: Deployment safety should be strong enough to resist even concerted attempts, with support from state programs, to jailbreak or otherwise misuse the model.”