Total Score:
Very Weak

0.1/5

Non Existent
Strong

Risk Identification

Strong
Substantial
Moderate
Weak
 Very
Weak
Very Weak  

0.25/5

info icon
0 : Non Existent
0 - 1 : Very Weak
1 - 2 : Weak
2 - 3 : Moderate
3 - 4 : Substantial
4 - 5 : Strong

Risk Tolerance & Analysis

 Very
Weak
Weak
Moderate
Substantial
Strong
Non Existent 

0/5

info icon
0 : Non Existent
0 - 1 : Very Weak
1 - 2 : Weak
2 - 3 : Moderate
3 - 4 : Substantial
4 - 5 : Strong

Risk Mitigation

Strong
Substantial
Moderate
Weak
 Very
Weak
Non Existent  

0/5

info icon
0 : Non Existent
0 - 1 : Very Weak
1 - 2 : Weak
2 - 3 : Moderate
3 - 4 : Substantial
4 - 5 : Strong
Thank you!
Your PDF will be sent soon
Oops! Something went wrong while submitting the form.

Risk Identification

‍In risk identification, we assess whether an AI developer is: 

  • Approaching in an appropriate way risks outlined by the literature.
  • Doing extensive open-ended red teaming to identify new risks.
  • Leveraging a diverse range of risk identification techniques, including threat modeling when appropriate, to adequately identify new threats.
Risk Identification
info icon
0 - No information available.

1 - Some risks are in scope of the risk management process. Some efforts of open-ended red teaming are reported, along with very basic threat and risk modeling.

2 - A number of risks are in the scope of the risk management process, but some important ones are missing. Significant efforts of open-ended red teaming are reported, along with significant threat modeling efforts.

3 - Most of the important and commonly discussed risks are in scope of the risk management process. Consequential red teaming is precisely reported, along with significant threat modeling and structured risk identification techniques usage.

4 - Nearly all the risks covered in the relevant literature are in scope of the risk management process. There is a methodology outlining how structured risk identification across the lifecycle is performed, precisely characterized red teaming (including from external parties) is carried out, along with advanced and broad threat and risk modeling.

5 - There is a comprehensive, continued, and detailed effort to ensure all risks are found and addressed. The red teaming and threat and risk modeling effort is extremely extensive, quantified, jointly integrated with structured risk identification efforts, and conducted with third parties.
Strong
Substantial
Moderate
Weak
Very Weak
Very Weak  

0.25/5

Highlights

  • Mistral analyzes bias in their first mixture-of-experts model, 'Mixtral of Experts', using the BOLD and BBQ benchmarks, comparing results to Llama 2.

Weaknesses

  • Mistral has not discussed bias or any other risks in releases following 'Mixtral of Experts' (December 11, 2023).
  • Mistral provides no evidence of open-ended red teaming, threat modeling, or other risk identification techniques.
  • The only mitigation measure discussed by Mistral demonstrates a lack of threat and risk modeling. For their first model, 'Mistral 7B', they introduced a system prompt to reduce harmful outputs. However, this approach is ineffective against misuse, as malicious actors can simply omit the prompt.
Thank you!
Your PDF will be sent soon
Oops! Something went wrong while submitting the form.

Risk Tolerance
& Analysis

In risk tolerance and analysis, we assess whether the AI developers have defined:

  • A global risk tolerance.
  • Operational capabilities thresholds and their equivalent risk. Those have to be defined with precision and breadth.
  • Corresponding objectives of risk mitigation measures: AI developers should establish clear objectives for risk mitigation measures. These objectives should be grounded in strong rationales, including threat modeling, to justify that they are sufficient to address the identified risks and align with the organization's risk tolerance.
  • Evaluation protocols detailing procedures for measuring the model's capabilities and ensuring that capability thresholds are not exceeded without detection.
Global Risk Tolerance
info icon
0 - No information available.

1 - Global risk tolerance is qualitatively defined.
E.g., “Our system should not increase the likelihood of extinction risks”.

2 - Global risk tolerance is quantitatively defined for casualties.

3 - Global risk tolerance is quantitatively defined for casualties and economic damages, with adequate ranges and rationale for the decision.

4 - Global risk tolerance is quantitatively defined for casualties, economic damages, and other high-severity risks (e.g., large-scale manipulation of public opinion), with robust methodology and decision-making processes to decide the tolerance (e.g., public consultation).

5 - Global risk tolerance is clearly and quantitatively defined for all significant threats and risks known in the literature. Any significant deviations in risk tolerance from industry norms are clearly justified and explained (e.g., through a comprehensive benefit/cost analysis).
Strong
Substantial
Moderate
Weak
Very Weak
Non Existent 

0/5

Global Risk Tolerance

Weaknesses

  • Mistral does not state any global risk tolerance, even qualitatively.
Operational Risk Tolerance
info icon
0 - No information available.

1 - Some important capability thresholds are qualitatively defined and their corresponding mitigation objectives are qualitatively defined as well.

2 - Some important capability thresholds are precisely defined, and their corresponding mitigations are precisely defined as well.

3 - Almost all important hazardous capability thresholds and their corresponding mitigation objectives are precisely defined and grounded in extensive threat and risk modeling.

4 - All hazardous capabilities are precisely defined. The corresponding mitigation objectives are quantitatively defined and grounded in extensive threat and risk modeling. Assurance property targets are operationalized.
 
5 - All hazardous capabilities have a precisely defined threshold. Corresponding mitigation objectives are quantified and grounded in comprehensive threat and risk modeling with a clear and in-depth methodology. Assurance property targets are operationalized and justified.
Strong
Substantial
Moderate
Weak
Very Weak
Non Existent 

0/5

Operational Risk Tolerance

Weaknesses

  • Mistral provides no information on capability thresholds, mitigation objectives, or assurance properties.

Evaluation Protocols
info icon
0 - No information available.

1 - Elements of the evaluation methodologies are described. The testing frequency is defined in terms of multiples of compute.

2 - The testing frequency is defined in terms of multiples of compute and there is a commitment to following it. The evaluation protocol is well-defined and includes relevant elicitation techniques. Independent third parties conduct pre-deployment evaluations with API access.

3 - The testing frequency is defined in terms of both multiples of compute and time and there is a commitment to following it. The evaluation protocol is well-defined and incorporates state-of-the-art elicitation techniques. A justification is provided demonstrating that these techniques are comprehensive enough to elicit capabilities that could be found and exercised by external actors. AI developers implement and justify measures (such as appropriate safety buffers), to ensure protocols can effectively detect capability threshold crossings.  Independent third parties conduct pre-deployment evaluations with fine-tuning access.

4 - The testing frequency is defined in terms of both multiples of compute and time. There is a commitment to following it and provides a rationale for why this chosen frequency is sufficient to detect significant capability changes. The evaluation protocol is well-defined and includes state-of-the-art elicitation techniques. The protocols are vetted by third parties to ensure that they are sufficient to detect threshold trespassing.

5 - The testing frequency is defined in terms of both multiples of compute and time. There is a commitment to following it and a rationale is provided for why this chosen frequency is sufficient to detect significant capability changes. The evaluation protocol is well-defined and includes relevant elicitation techniques. The protocols are vetted by third parties to ensure that they are sufficient to detect threshold trespassing and third parties are granted permission and resources to independently run their own evaluations, to verify the accuracy of the evaluation results.
Strong
Substantial
Moderate
Weak
Very Weak
Non Existent 

0/5

Tolerance & Analysis Score = 1/4 × Global Risk Tolerance + 1/2 × Operational Risk Tolerance + 1/4  × Evaluation Protocols
Evaluation Protocols

Weaknesses

  • Mistral provides no information about evaluation protocols for dangerous capabilities.
Thank you!
Your PDF will be sent soon
Oops! Something went wrong while submitting the form.

Risk Mitigation

In risk mitigation, we assess whether:

  • The proposed risk mitigation measures, which include both deployment and containment strategies, are well-planned and clearly specified.
  • There is a strong case for assurance properties to actually reduce risks, and the assumptions these properties are operating under are clearly stated.
Containment Measures
info icon
0 - No information available.

1 - Vague description of the countermeasures and no commitment to follow them. No evidence that they are sufficient to reduce risks below defined levels.
 
2 - Clearly defined countermeasures are planned to be used by default. There is preliminary qualitative evidence of effectiveness.

3 - Sufficiency is demonstrated through self-reporting, or by using methods that have been shown highly effective in similar context. Evaluations required to assess future sufficiency are under development (with a conditional policy to stop development or deployment if not met) or there is a commitment to use methods that have been shown to be effective in future contexts.

4 - Third-parties have certified the effectiveness of a fixed set of countermeasures against current and near-future threats, and check that current efforts are on track to sufficiently mitigate the risk from future systems.

5 - Concrete countermeasures are described and vetted. There is a commitment to apply them beyond certain risk thresholds, and there is broad consensus that they are sufficient to reduce risk for both current and future systems.
Strong
Substantial
Moderate
Weak
Very Weak
Non Existent

 0/5

Containment Measures

Weaknesses

  • Mistral describes no containment measures.
Deployment Measures
info icon
0 - No information available.

1 - Vague description of the countermeasures and no commitment to follow them. No evidence that they are sufficient to reduce risks below defined levels.
 
2 - Clearly defined countermeasures are planned to be used by default. There is preliminary qualitative evidence of effectiveness.

3 - Sufficiency is demonstrated through self-reporting, or by using methods that have been shown highly effective in similar context. Evaluations required to assess future sufficiency are under development (with a conditional policy to stop development or deployment if not met) or there is a commitment to use methods that have been shown to be effective in future contexts.

4 - Third-parties have certified the effectiveness of a fixed set of countermeasures against current and near-future threats, and check that current efforts are on track to sufficiently mitigate the risk from future systems.

5 - Concrete countermeasures are described and vetted. There is a commitment to apply them beyond certain risk thresholds, and there is broad consensus that they are sufficient to reduce risk for both current and future systems.
Strong
Substantial
Moderate
Weak
Very Weak
Non Existent

 0/5

Deployment Measures

Weaknesses

  • Mistral describes no threat model–relevant deployment measures.
Assurance Properties
info icon
0 - No information available.

1 - Limited pursuit of some assurance properties, sparse evidence of how promising they are to reduce risks.

2 - Pursuit of some assurance properties along with research results indicating that they may be promising. Some of the key assumptions the assurance properties are operating under are stated.

3 - Pursuit of assurance properties, some evidence of how promising they are, and a clear case for one of the research directions being sufficient for a positive safety case. The assumptions the assurance properties are operating under are stated but some important ones are missing.

4 - Pursuit of assurance properties, solid evidence of how promising they are, and a clear case for one of the research directions being sufficient for a positive safety case. All the assumptions the assurance properties are operating under are stated.

5 - Broad consensus that one assurance property is likely to work, is being strongly pursued, and there is a strong case for it to be sufficient. All the assumptions the assurance properties are operating under are clearly stated and justified.
Strong
Substantial
Moderate
Weak
Very Weak
Non Existent

 0/5

For Risk Mitigation, all the grades: Containment Mitigation, Deployment Mitigation and Assurance Properties have the same weights.
Assurance Properties

Weaknesses

  • Mistral provides no information regarding the pursuit of assurance properties.
Thank you!
Your PDF will be sent soon
Oops! Something went wrong while submitting the form.
Sections

Best-in-class: These are elements where the company outperforms all the others. They represent industry-leading practices.
Highlights: These are the company's strongest points within the category, justifying its current grade.
Weaknesses: These are the areas that prevent the company from achieving a higher score.

References

The main source of information available for Mistral AI is the ‘news’ page on their website.