Meta

Total Score:

Very Weak

0.7/5

Non Existent

Strong

Best practices

Risk Identification

Strong

Substantial

Moderate

Weak

Very
Weak

Weak

1.5/5

Learn more

Risk Tolerance & Analysis

Very
Weak

Weak

Moderate

Substantial

Strong

Very Weak

0.1/5

Learn more

Risk Mitigation

Strong

Substantial

Moderate

Weak

Very
Weak

Very Weak

0.3/5

Learn more

Thank you!
Your PDF will be sent soon

Oops! Something went wrong while submitting the form.

Risk Identification

‍In risk identification, we assess whether an AI developer is:

Approaching in an appropriate way risks outlined by the literature.
Doing extensive open-ended red teaming to identify new risks.
Leveraging a diverse range of risk identification techniques, including threat modeling when appropriate, to adequately identify new threats.

Risk Identification

Strong

Substantial

Moderate

Weak

Very Weak

Weak

1.5/5

Best-in-Class

Meta developed CyberSecEval 2, the best publicly available benchmark to assess LLM capabilities relevant to cyber offense.
Meta created Rainbow Teaming, a state-of-the-art technique to automatically uncover model-specific vulnerabilities.

Highlights

In the Llama 3 model card, Meta mentions cybersecurity and CBRNE misuse risks.
Meta has produced research on cybersecurity safety benchmarking in their CyberSecEval 2 paper.
Meta performs Rainbow Teaming and red-teaming to discover new risks, as described in the Llama 3 herd of models paper.

Weaknesses

Some of Meta's proposed mitigations reveal a lack of basic threat modeling. For instance, their cybersecurity risk mitigation measure asks threat actors to act responsibly on their own. Meta releases a separate model called "Llama Code Shield" to offer "mitigation of insecure code suggestions risk, code interpreter abuse prevention, and secure command execution". However, this mitigation is irrelevant in a misuse scenario, as threat actors will simply not use Llama Code Shield.
Meta addresses additional risks in their fine-tuned Llama models. However, since they also release the base model, most of these mitigation measures lack grounding in threat modeling. Malicious actors will use the base model without the fine-tuned mitigations, rendering these safety measures ineffective in real-world scenarios. An exception to this is the mitigation of biases, as relevant risk models involve non-malevolent actors unintentionally inducing biases in society through the deployment of an AI model. In this case, the mitigation is more effective, as these actors are likely to use the fine-tuned model with bias reduction measures in place.

Thank you!
Your PDF will be sent soon

Oops! Something went wrong while submitting the form.

Risk Tolerance
& Analysis

In risk tolerance and analysis, we assess whether the AI developers have defined:

A global risk tolerance.
Operational capabilities thresholds and their equivalent risk. Those have to be defined with precision and breadth.
Corresponding objectives of risk mitigation measures: AI developers should establish clear objectives for risk mitigation measures. These objectives should be grounded in strong rationales, including threat modeling, to justify that they are sufficient to address the identified risks and align with the organization's risk tolerance.
Evaluation protocols detailing procedures for measuring the model's capabilities and ensuring that capability thresholds are not exceeded without detection.

Global Risk Tolerance

Strong

Substantial

Moderate

Weak

Very Weak

Non Existent

0/5

Global Risk Tolerance

Weaknesses

Meta does not state any global risk tolerance, even qualitatively.

Operational Risk Tolerance

Strong

Substantial

Moderate

Weak

Very Weak

Non Existent

0/5

Operational Risk Tolerance

Weaknesses

Meta does not define any capability thresholds or mitigation objective thresholds for their base Llama 3 model. They do not specify the level of dangerous capabilities that would be unacceptable for publicly releasing weights. Additionally, Meta does not mention the required level of risk reduction that must be achieved through safety measures before releasing the model.

Evaluation Protocols

Strong

Substantial

Moderate

Weak

Very Weak

Very Weak

0.5/5

Tolerance & Analysis Score = 1/4 × Global Risk Tolerance + 1/2 × Operational Risk Tolerance + 1/4 × Evaluation Protocols

Evaluation protocols

Highlights

Meta has performed uplift testing for cyber attacks and chemical and biological weapons. The chemical and biological weapons testing specifies some elicitation techniques such as the use of Python, Wolfram Alpha, and RAG. CBRNE subject matter experts validated the results.

Weaknesses

Meta does not communicate any specific testing frequency for their evaluation protocols.
Due to the lack of defined capability thresholds, Meta's evaluation protocols lack clear evaluation targets for determining when model capabilities become potentially dangerous.

‍

Thank you!
Your PDF will be sent soon

Oops! Something went wrong while submitting the form.

Risk Mitigation

In risk mitigation, we assess whether:

The proposed risk mitigation measures, which include both deployment and containment strategies, are well-planned and clearly specified.
There is a strong case for assurance properties to actually reduce risks, and the assumptions these properties are operating under are clearly stated.

Containment Measures

Strong

Substantial

Moderate

Weak

Very Weak

Non Existent

0/5

Containment Measures

Weaknesses

Meta does not provide any specific information on their containment measures. It is important to note that containment measures would only be relevant if their pre-deployment testing measures were comprehensive and effective. This would involve using clear risk thresholds and rigorous testing procedures to establish a well-defined decision-making framework for determining whether releasing model weights is appropriate.

‍

Deployment Measures

Strong

Substantial

Moderate

Weak

Very Weak

Very Weak

0.5/5

Deployment Measures

Highlights

Meta has implemented some reporting mechanisms, stating: “ we put in place a set of resources including an output reporting mechanism and bug bounty program to continuously improve the Llama technology with the help of the community.”

Weaknesses

Meta's decision to publicly release the weights of their large language models means that very few deployment measures are applicable to control or monitor the use of these models once they are released. Note that releasing the weights of a model is not inherently problematic, provided that a thorough threat and risk modeling process has been conducted to assess the potential risks associated with making the model publicly available. This rigor is particularly important given the irreversible nature of releasing the weights.

Assurance Properties

Strong

Substantial

Moderate

Weak

Very Weak

Very Weak

0.5/5

For Risk Mitigation, all the grades: Containment Mitigation, Deployment Mitigation and Assurance Properties have the same weights.

Assurance Properties

Highlights

Meta is conducting research on a novel architecture called JEPA. Some evidence shows that JEPA learns higher-level representations which could plausibly enhance interpretability and robustness.

Weaknesses

Even though JEPA could potentially enhance interpretability and robustness, there is no particular discussion of these aspects from Meta, which does not research assurance properties—safety guarantees that become necessary once models achieve dangerous capabilities.

‍

Thank you!
Your PDF will be sent soon

Oops! Something went wrong while submitting the form.

Sections

Best-in-class: These are elements where the company outperforms all the others. They represent industry-leading practices.
Highlights: These are the company's strongest points within the category, justifying its current grade.
Weaknesses: These are the areas that prevent the company from achieving a higher score.