1.3.1 The company uses risk models for all the risk domains identified
and the risk models are published (with potentially dangerous
information redacted) (40%) 50%
It is clear they conduct threat modeling for the biological and chemical weapons domain: they outline the required facets for these harms to materialize, separated into planning, circumvention, materials and methods. It is commendable they publish this risk model (though it could more concretely map out the causal pathway, as it currently reads more as intervention points), and further that they include which experts they developed this risk model in collaboration with: “These steps were identified in close collaboration with domain matter experts at SecureBio, NIST, RAND, and EBRC.”
For malicious use risks (which assumedly includes CBRN risks), they “identify critical steps in major risk scenarios […] to inhibit user progress in advancing through such steps” and “[work] with a variety of governmental bodies, non-governmental organizations, private testing firms, industry peers, and academic researchers to identify such inhibiting steps.” This suggests they conduct risk modeling for CBRN risks.
To improve, they should publish the full risk models for other risk domains, and publish their methodology for deriving these risk models. They should also include justification for why they believe they have considered all relevant risk pathways. For instance, the risk model they give for biological/chemical weapons is only one pathway for materializing harm, and there may be other ways to realize harm within this risk domain (e.g. nuances within this provided pathway).
They should also conduct and publish risk models for their other risk domains, such as loss of control risks – whilst they state that “Exact scenarios of loss of control risks are speculative and difficult to precisely specify” and “while difficult to pinpoint particular risk scenarios, it is generally understood that certain concerning propensities of AI models, such as deception and sycophancy, may heighten the overall risk of such outcomes, such as propensities for deception and sycophancy”, this risk modeling is necessary to ensure their risk management is adequate.
Quotes:
“xAI approaches addressing risks using threat modeling.
To design a bioweapon, a malicious actor must undergo a design process. In this threat model, “ideation” involves actively planning for a biological attack; “design” involves retrieving blueprints for a hazardous agent, such as determining the DNA sequence; “build” consists of the protocols, reagents, and equipment necessary to create the threat; and “test” consists of measuring characteristics or properties of the pathogen of interest. By “learning” from these results and iterating after the test phase, the design can be revised until the threat is released [Nelson and Rose, 2023]. In the setting of biological and chemical weapons, xAI considers 5 critical steps where we restrict xAI models from providing detailed information or substantial assistance:
- Planning: brainstorming ideas or plans for creating a pathogen or chemical weapons or
precursors, capable of causing severe harm to humans, animals, or crops
- Circumvention: circumventing existing supply chain controls in order to access:
- Restricted biological supplies
- Export controlled chemical or biological equipment
- Materials: acquiring or producing pathogens on the US Select Agents list or Australia Group list, or CWC Schedule I chemicals or precursors
- Theory: understanding molecular mechanisms governing, or methods for
altering, certain pathogen traits such as transmissibility and virulence.
- Methods: performing experimental methods specific to animal-infecting pathogens, including:
- Methods that relate to infecting animals or human-sustaining crops with
pathogens or sampling pathogens from animals
- Methods that relate to pathogen replication in animal cell cultures, tissues, or
eggs, including serial passage, viral rescue, and viral reactivation
- Specific procedures to conduct BSL-3 or BSL-4 work using unapproved facilities
and equipment
- Genetic manipulation of animal-infecting pathogens
- Quantification of pathogenicity, such as infectious dose, lethal dose, and assays of virus-cell interactions
These steps were identified in close collaboration with domain matter experts at SecureBio, NIST, RAND, and EBRC. xAI restricts its models from providing information that could accelerate user learning related to these steps through the use of AI-powered filters that specifically monitor user conversations for content matching these narrow topics and return a brief message declining to answer when activated.” (pp. 4–5)
“Independent third-party assessments of xAI’s current models on realistic offensive cyber tasks requiring identifying and chaining many exploits in sequence indicate that xAI’s models remain below the offensive cyber abilities of a human professional.” (p. 5)
“xAI has focused on the risks of malicious use and loss of control, which cover many different specific risk scenarios. Risk scenarios become more or less likely depending on different model behaviors. For example, an increase in offensive cyber capabilities heightens the risk of a rogue AI but does not significantly change the risk of enabling a bioterrorism attack.” (p. 1)
“Approach to Mitigating Risks of Malicious Use: Alongside comprehensive evaluations measuring dual-use capabilities, our mitigation strategy for malicious use risks is to identify critical steps in major risk scenarios and implement redundant layers of safeguards in our models to inhibit user progress in advancing through such steps. xAI works with a variety of governmental bodies, non-governmental organizations, private testing firms, industry peers, and academic researchers to identify such inhibiting steps, commonly referred to as bottlenecks, and implement commensurate safeguards to mitigate a model’s ability to assist in accelerating a bad actor’s progress through them.” (pp. 1–2)
“Approach to Mitigating Risks of Loss of Control: Exact scenarios of loss of control risks are speculative and difficult to precisely specify.” (p. 2)
“One of the most salient risks of AI within the public consciousness is the loss of control of advanced AI systems. While difficult to pinpoint particular risk scenarios, it is generally understood that certain concerning propensities of AI models, such as deception and sycophancy, may heighten the overall risk of such outcomes, such as propensities for deception and sycophancy.” (p. 6)
1.3.2 Risk modeling methodology (40%) 2%
1.3.2.1 Methodology precisely defined (70%) 0%
While they mention that “xAI approaches addressing risks using threat modeling” and that they “identify critical steps in major risk scenarios”, there is no methodology for risk modeling defined nor indication of a methodology.
Quotes:
No relevant quotes found.
1.3.2.2 Mechanism to incorporate red teaming findings (15%) 0%
No mention of risks identified during open-ended red teaming or evaluations triggering further risk modeling.
Quotes:
No relevant quotes found.
1.3.2.3 Prioritization of severe and probable risks (15%) 10%
There is a focus on mitigating harms which have a “non-trivial risk of resulting in large-scale violence […]”. This demonstrates an implicit prioritization of risk models which have higher severity or probability.
However, there should be a clear statement that the most severe and probable harms are prioritized, with a defined process for doing so. Further, risk models should be published with quantified severity and probability scores, plus the reasoning behind these scores, to provide transparency into this prioritization.
Quotes:
“In this RMF, we particularly focus on requests that pose a foreseeable and non-trivial risk of more than one hundred deaths or over $1 billion in damages from weapons of mass destruction or cyberterrorist attacks on critical infrastructure (“catastrophic malicious use events”).” (p. 3)
“Under this draft risk management framework, Grok would apply heightened safeguards if it receives requests that pose a foreseeable and non-trivial risk of resulting in large-scale violence, terrorism, or the use, development, or proliferation of weapons of mass destruction, including CBRN weapons, and major cyber weapons on critical infrastructure.” (p. 2)
“It is also possible that AIs may develop value systems that are misaligned with humanity’s interests and inflict widespread harms upon the public.” (p. 6)
1.3.3 Third party validation of risk models (20%) 50%
While risk models are not formally verified by third parties, they do detail collaboration with third parties such as SecureBio, NIST, RAND and EBRC. Naming these parties in the framework counts towards accountability. To improve, a statement that risk models have been validated by third parties, such as through an external report or signoff/review, should be given.
Quotes:
“In the setting of biological and chemical weapons, xAI considers 5 critical steps where we restrict xAI models from providing detailed information or substantial assistance: […] These steps were identified in close collaboration with domain matter experts at SecureBio, NIST, RAND, and EBRC.” (pp. 4–5)
“Approach to Mitigating Risks of Malicious Use: Alongside comprehensive evaluations measuring dual-use capabilities, our mitigation strategy for malicious use risks is to identify critical steps in major risk scenarios and implement redundant layers of safeguards in our models to inhibit user progress in advancing through such steps. xAI works with a variety of governmental bodies, non-governmental organizations, private testing firms, industry peers, and academic researchers to identify such inhibiting steps, commonly referred to as bottlenecks, and implement commensurate safeguards to mitigate a model’s ability to assist in accelerating a bad actor’s progress through them.” (pp. 1–2)