Cohere

Very Weak 0.4/5
very weak
weak
moderate
substantial
strong
Risk Identification
Learn more
Risk Identification
8%
Risk Analysis and Evaluation
Learn more
Risk Analysis and Evaluation
5%
Risk Treatment
Learn more
Risk Treatment
12%
Risk Governance
Learn more
Risk Governance
7%
Best in class

SEE FRAMEWORK

  • Cohere has an unusually clear risk prioritization process, where they not only assess risks on their likelihood and severity, but also describe specifically how they determine which risks to focus on.
Overview
Highlights relative to others

Risk management authority is delegated from the CEO to the Chief Scientist.

Stronger information sharing commitments.

More detailed description for monitoring control indicators.

Weaknesses relative to others

Lacking most aspects of risk governance, such as Board and management committees, central risk and audit functions, and speak-up culture.

Lacking most aspects of risk management, such as risk thresholds, mitigation thresholds, or a risk tolerance.

Explicit rejection of including certain risks, such as CBRN (Chemical, Biological, Radiological, Nuclear) and autonomous research, due to these risks being speculative, without indication for how that decision is reached nor what would make it change in the future.

1.1 Classification of Applicable Known Risks (40%) 10%

1.1.1 Risks from literature and taxonomies are well covered (50%) 10%

They don’t list specific risk domains that their risk management process focuses on ex ante. Rather, risk domains are identified for particular customers and use cases. However, their risk domains focus on malicious use and bias, with examples in cybersecurity, child sexual exploitation, and discrimination. More detail on why they chose to focus on these issues and how they came to identify these risks is required, especially as they differ from the industry standard.

They explicitly do not consider CBRN or loss of control risks, and explicitly do not consider “potential future risks associated with LLMs”. This is a serious limitation that requires strong justification; given the harms from loss of control or CBRN could be substantial, dismissing monitoring these risks at all requires a high amount of confidence. However, 1.1.2 scores less than 50%. Further, it shows they have not engaged with literature – for instance, there is emphasis on these risks in documents such as the International Science of AI Safety Report and current drafts of the EU AI Act Codes of Practice.

Quotes:

“One approach to risk assurance in the AI industry is focused on risks described as catastrophic or severe, such as capabilities related to radiological and nuclear weapons, autonomy, and self-replication. In this context, thresholds relating to these potential catastrophic risks are developed, and the approach described in safety frameworks is designed to assess risks that are speculated to arise when models attain specific capabilities, such as the ability to perform autonomous research or facilitate biorisk. The models are then deemed to present “unacceptable” levels of risk when certain capability levels are attained. While it is important to consider long-term, potential future risks associated with LLMs and the systems in which they are deployed, studies regarding the likelihood of these capabilities arising and leading to real-world harm are limited in their methodological maturity and transparency, often lacking clear theoretical threat models or developed empirical methods due to their nascency. For example, existing research into how LLMs may increase biorisks fails to account for entire risk chains beyond access to information, and does not systematically compare LLMs to other information access tools, such as the internet. More work is needed to develop methods for assessing these types of threats more reliably.” (pp. 14-15)

“Cohere’s approach to risk assurance, and to determining when models and systems are suffi ciently safe and secure to be made available to our customers, is focused on risks that are known, measurable, or observable today” (p. 15)

“Limitations in training data, such as unrepresentative data distributions, historically outdated representations, or an imbalance between harmful patterns and attributes on the one hand and positive patterns and attributes on the other, also impact model capabilities. If these limitations are not mitigated, models can output harmful content, such as hateful or violent content, or child sexual exploitation and abuse material (CSAM).

We therefore focus our secure AI work on risks that have a high likelihood of occurring based on the types of tasks LLMs are highly performant in, as well as the limitations inherent in how these models function. This is what we refer to as “model capabilities.”

We place potential risks arising from LLM capabilities into one of two categories:

  1. Risks stemming from possible malicious use of foundation AI models, such as generating content to facilitate cybercrime or child sexual exploitation
  2. Risks stemming from possible harmful outputs in the ordinary, non-malicious use of foundation models, such as outputs that are inaccurate in a way that has a harmful impact on a person or a group” (p. 5)

“Cohere consistently reviews state-of-the-art research and industry practice regarding the risks associated with AI, and uses this to determine our priorities. At Cohere, risks to our systems are identifi ed through a list of continuously-expanding techniques, including:

  • Mitigating core vulnerabilities identifi ed by the Open Worldwide Application Security Project (OWASP)
  • Internal threat modeling, which includes a review of how our customers interact with and use our models, to proactively identify potential threats and implement specific counter measures before deployment
  • Monitoring established and well-researched repositories of security attacks and vulnerabilities for AI, such as the Mitre Altas database
    With these methods, Cohere can identify risks such as data poisoning, model theft, inference attacks, injection attacks, and output manipulation.” (p. 6)

Potential Harm: Outputs that result in a discriminatory outcome, insecure code, child sexual exploitation and abuse, malware.

“The examples provided above consider the likelihood and severity of potential harms in the enterprise contexts in which Cohere models are deployed. A similar assessment of potential harms from the same models deployed in contexts such as a consumer chatbot would result in a different risk profile.” (p. 8)

“Preventing the generation of harmful outputs involves testing and evaluation techniques to control the types of harmful output described in Section 1, for example, child sexual abuse material (CSAM), targeted violence and hate, outputs that result in discriminatory outcomes for protected groups, or insecure code.” (p. 11)

1.1.2 Exclusions are clearly justified and documented (50%) 10%


They explicitly do not consider CBRN or loss of control risks, and explicitly do not consider “potential future risks associated with LLMs”, giving justification that “studies regarding the likelihood of these capabilities arising and leading to real-world harm are limited in their methodological maturity and transparency, often lacking clear theoretical threat models or developed empirical methods due to their nascency.” However, this reasoning requires more documentation and justification, for instance citing these studies and why they believe their reasoning to be limited. Excluding a risk that is established in taxonomies and literature carries a high burden of proof.

Quotes:

“Cohere’s approach to risk assurance, and to determining when models and systems are suffi ciently safe and secure to be made available to our customers, is focused on risks that are known, measurable, or observable today” (p. 15)

“One approach to risk assurance in the AI industry is focused on risks described as catastrophic or severe, such as capabilities related to radiological and nuclear weapons, autonomy, and self-replication. In this context, thresholds relating to these potential catastrophic risks are developed, and the approach described in safety frameworks is designed to assess risks that are speculated to arise when models attain specific capabilities, such as the ability to perform autonomous research or facilitate biorisk. The models are then deemed to present “unacceptable” levels of risk when certain capability levels are attained. While it is important to consider long-term, potential future risks associated with LLMs and the systems in which they are deployed, studies regarding the likelihood of these capabilities arising and leading to real-world harm are limited in their methodological maturity and transparency, often lacking clear theoretical threat models or developed empirical methods due to their nascency. For example, existing research into how LLMs may increase biorisks fails to account for entire risk chains beyond access to information, and does not systematically compare LLMs to other information access tools, such as the internet. More work is needed to develop methods for assessing these types of threats more reliably.” (pp. 14-15)

1.2 Identification of Unknown Risks (Open-ended red teaming) (20%) 0%

1.2.1 Internal open-ended red teaming (70%) 0%

The framework doesn’t mention any procedures pre-deployment to identify novel risk domains or risk models for the frontier model. To improve, they should commit to such a process to identify either novel risk domains, or novel risk models/changed risk profiles within pre-specified risk domains (e.g. emergence of an extended context length allowing improved zero shot learning changes the risk profile), and provide methodology, resources and required expertise.

Quotes:

No relevant quotes found.

1.2.2 Third party open-ended red teaming (30%) 0%

The framework doesn’t mention any third-party procedures pre-deployment to identify novel risk domains or risk models for the frontier model.

There is mention of multi-disciplinary red teaming and consultation of domain experts during the “Training, evaluation and testing” stage of model development. However, this is not explicitly for the purpose of identifying novel risks, and criteria for expertise are not given.

To improve, they should commit to an external process to identify either novel risk domains, or novel risk models/changed risk profiles within pre-specified risk domains (e.g. emergence of an extended context length allowing improved zero shot learning changes the risk profile), and provide methodology, resources and required expertise.

Quotes:

“Multi-disciplinary red teaming […] Consultation of domain experts” (p. 13)

 

1.3 Risk modeling (40%) 9%

1.3.1 The company uses risk models for all the risk domains identified
and the risk models are published (with potentially dangerous
information redacted) (40%) 10%

There is some evidence of conducting risk modelling, plus considering use cases and the potential likelihood and severity of harms from those use cases.

More evidence of a structured process for this risk modeling should be given, including methodology, experts involved, and the lists of identified threat scenarios. More detail is required on the step by step causal pathway of these scenarios to harm, plus justification that adequate effort has been exerted to systematically map out all possible risk pathways. Risk models should be published.

Quotes:

“At Cohere, risks to our systems are identified through a list of continuously-expanding techniques, including: […] Internal threat modeling, which includes a review of how our customers interact with and use our models, to proactively identify potential threats and implement specific counter measures before deployment” (p. 6)

 

1.3.2 Risk modeling methodology (40%) 11%

1.3.2.1 Methodology precisely defined (70%) 0%

There is no methodology for risk modeling defined.

Quotes:

No relevant quotes found.

1.3.2.2 Mechanism to incorporate red teaming findings (15%) 0%

No mention of risks identified during open-ended red teaming or evaluations triggering further risk modeling.

Quotes:

No relevant quotes found.

1.3.2.3 Prioritization of severe and probable risks (15%) 75%

There is a clear assessment and subsequent prioritization of risk models representing the most severe and probable harms. This appears to be from the full space of risk models. However, more detail on the scores given for likelihood and severity of different risk models should be published.

Quotes:

“We identify risks by first assessing potential risks arising from our models’ capabilities and the systems in which they may be deployed. We then assess the likelihood and severity of potential harms that may arise in enterprise contexts from the identified risks.” (p. 5)

“We therefore focus our secure AI work on risks that have a high likelihood of occurring based on the types of tasks LLMs are highly performant in, as well as the limitations inherent in how these models function. This is what we refer to as “model capabilities.” (p. 5)

Use case, likelihood of harm in context, severity of harm in context. For instance, “Insecure Code. Code generation for enterprise developers managing a company’s proprietary data within on-premises servers. Medium to High possibility of a vulnerability being introduced into company code. Medium to High [severity of harm in context], depending on the nature
of the vulnerability introduced and the type of data handled by the company. Severe vulnerabilities can leave companies vulnerable to cyber attacks affecting individuals and society.”

1.3.3 Third party validation of risk models (20%) 0%

There is no evidence that third parties validate risk models.

Quotes:

No relevant quotes found.

Back to top

2.1 Setting a Risk Tolerance (35%) 3%

2.1.1 Risk tolerance is defined (80%) 3%

2.1.1.1 Risk tolerance is at least qualitatively defined for all risks (33%) 10%

Their risk tolerance for when the residual risk is “acceptable” is if there are “no significant regressions [demonstrated in evaluations and tests] compared to our previously launched model versions.” Risk tolerances are also allowed to differ based on the customer: “analysis of whether a model is “acceptable” from a risk management perspective must be adapted to the customer context”.

However, this risk tolerance is still vague, and allows Cohere to have plenty of discretion. To improve, they should predefine a risk tolerance that applies to all models, expressed in terms of probability of some severity.

Quotes:

“We consider models safe and secure to launch when our evaluations and tests demonstrate no significant regressions compared to our previously launched model versions, so that performance and security is maintained or improved for every new significant model version. This is Cohere’s bright line for determining when a model is “acceptable” from a risk management perspective and ready to be launched.” (p. 16)

“In this way, the analysis of whether a model is “acceptable” from a risk management perspective must be adapted to the customer context, and must be able to adapt to new requirements or needs that emerge post-deployment. Assurance here means working with our customers to ensure that our models and systems conform to their risk management obligations and standards.” (p. 17)

2.1.1.2 Risk tolerance is expressed at least partly quantitatively as a combination of scenarios (qualitative) and probabilities (quantitative) for all risks (33%) 0%

The risk tolerance, implicit or otherwise, is not expressed fully or partly quantitatively. To improve, the risk tolerance should be expressed fully quantitatively or as a combination of scenarios with probabilities.

Quotes:

No relevant quotes found.

2.1.1.3 Risk tolerance is expressed fully quantitatively as a product of severity (quantitative) and probability (quantitative) for all risks (33%) 0%

The risk tolerance, implicit or otherwise, is not expressed fully or partly quantitatively. To improve, the risk tolerance should be expressed fully quantitatively or as a combination of scenarios with probabilities.

Quotes:

No relevant quotes found.

2.1.2 Process to define the tolerance (20%) 0%

2.1.2.1 AI developers engage in public consultations or seek guidance from regulators where available (50%) 0%

No evidence of engaging in public consultations or seeking guidance from regulators for risk tolerance.

Quotes:

No relevant quotes found.

2.1.2.2 Any significant deviations from risk tolerance norms established in other industries is justified and documented (e.g., cost-benefit analyses) (50%) 0%

No justification process: No evidence of considering whether their approach aligns with or deviates from established norms.

Quotes:

No relevant quotes found.

2.2 Operationalizing Risk Tolerance (65%) 6%

2.2.1 Key Risk Indicators (KRI) (30%) 15%

2.2.1.1 KRI thresholds are at least qualitatively defined for all risks (45%) 10%

There are implicit KRI assessments which are conducted, but the KRI thresholds are not given. To improve, thresholds which would trigger mitigations should be developed. The KRIs should also be grounded in risk modeling.

Quotes:

Key risks:
“Data acquisition and preparation stage:

  • Data poisoning
  • Supply chain vulnerabilities
  • Model theft
  • Insecure plugin design
  • Unrepresentative data
    distributions
  • Imbalance of data with harmful patterns and attributes vs. positive patterns and attributes
  • Historically outdated representations in data
  • Inaccurate proxies when used to measure representativeness or imbalances” (p. 12)

“Training, evaluations and testing.

  • Data poisoning
  • Data leakage
  • Model theft
  • Adversarial attacks
  • Evaluation criteria and data are not representative of a population
  • Disparate performance in different cases results in disproportionate impact on certain populations
  • Models and data are fit for an aggregated, dominant population but sub-optimal for sub-groups within the population” (p. 13)

“Deployment and maintenance.

  • Prompt injection
  • Insecure output handling
  • Model denial of service
  • Excessive agency
  • Sensitive information
    disclosure
  • Misuse
  • Unexpected post-deployment usage patterns that were not accounted for and result in unmitigated risk” (p. 13)

“Improvement and further fine-tuning.

  • Prompt injection
  • Insecure input/output handling
  • Model denial of service
  • Excessive agency
  • Sensitive information
    disclosure
  • Adversarial attacks
  • Evaluation criteria and data are not representative of a population
  • Model design choices amplify performance disparity across different examples in the data” (p. 14)

“Multi-faceted evaluations, including standard benchmarks and proprietary evaluations based on identified possible harms and harm reduction objectives” (p. 13)

2.2.1.2 KRI thresholds are quantitatively defined for all risks (45%) 0%

There is no evidence of KRI thresholds being quantitatively defined.

Quotes:

No relevant quotes found.

2.2.1.3 KRIs also identify and monitor changes in the level of risk in the external environment (10%) 10%

“Unexpected post-deployment usage patterns that were not accounted for and result in unmitigated risk” are described as a key risk to track during the deployment and maintenance stage. However, a threshold which triggers mitigations should be defined.

Quotes:

Key Risks: “Unexpected post-deployment usage patterns that were not accounted for and result in unmitigated risk” (p. 13)

2.2.2 Key Control Indicators (KCI) (30%) 6%

2.2.2.1 Containment KCIs (35%) 13%
2.2.2.1.1 All KRI thresholds have corresponding qualitative containment KCI thresholds (50%) 25%

There is evidence of aligning to a standard, i.e. SOC 2 Type II, but this is not tied to a specific KRI threshold and it is not clear how this threshold differs as model risks vary.

Quotes:

“We align our program to SOC 2 Type II and other recognized frameworks, and we rigorously monitor the health and performance of our security controls throughout the year, performing real-time corrective action when needed.” (p. 9)

2.2.2.1.2 All KRI thresholds have corresponding quantitative containment KCI thresholds (50%) 0%

There is no evidence of a quantitative containment KCI threshold.

Quotes:

No relevant quotes found.

2.2.2.2 Deployment KCIs (35%) 5%
2.2.2.2.1 All KRI thresholds have corresponding qualitative deployment KCI thresholds (50%) 10%

There are “goals” for mitigation practices to reach in general, though these are vague – for instance, “adhering to guardrails” or “minimizing over-refusal”. To improve, these goals (which are proto deployment KCI thresholds) should have more detail for what the criteria of sufficiency would be. They should also be linked to KRIs.

Quotes:

“More specifically, our harm mitigation practices are focused on achieving the following goals:

  • Preventing the generation of harmful outputs in multilingual enterprise use cases
  • Adhering to guardrails
  • Minimizing over-refusal” (p. 11)

“Cohere’s models, their training data, and the guardrails within which they operate are dynamically updated throughout the development process to achieve the three harm mitigation objectives described above.” (p. 11)

2.2.2.2.2 All KRI thresholds have corresponding quantitative deployment KCI thresholds (50%) 0%

There are no quantitative deployment KCI thresholds given.

Quotes:

No relevant quotes found.

2.2.2.3 For advanced KRIs, assurance process KCIs are defined (30%) 0%

There are no assurance processes KCIs defined. The framework does not provide recognition of there being KCIs outside of containment and deployment measures.

Quotes:

No relevant quotes found.

2.2.3 Pairs of thresholds are grounded in risk modeling to show that risks remain below the tolerance (20%) 0%

There is no evidence of reasoning that if KRIs are crossed but KCIs are reached, then risks remain below the risk tolerance.

Quotes:

No relevant quotes found.

2.2.4 Policy to put development on hold if the required KCI threshold cannot be achieved, until sufficient controls are implemented to meet the threshold (20%) 0%

There is no policy to put development or deployment on hold mentioned in the framework.

Quotes:

No relevant quotes found.

Back to top

3.1 Implementing Mitigation Measures (50%) 12%

3.1.1 Containment measures (35%) 19%

3.1.1.1 Containment measures are precisely defined for all KCI thresholds (60%) 25%

While containment measures are defined, most remain high-level (e.g., “secure, risk- based defaults and internal reviews”, or “Supply chain controls for any third parties (e.g., data vendors or third-party data annotation)”, or “Blocklists”) More detail on the measures actually implemented or planned to be implemented is needed to improve. They should also be linked to specific KCI (and thus KRI) thresholds.

Quotes:

“These controls include:

  • Advanced perimeter security controls and real-time threat prevention and monitoring
  • Secure, risk-based defaults and internal reviews
  • Advanced endpoint detection and response across our cloud infrastructure and distributed devices
  • Strict access controls, including multifactor authentication, role-based access control, and just-in-time access, across and within our environment to protect against insider and external threats (internal access to unreleased model weights is even more strenuously restricted)
  • “Secure Product Lifecycle” controls, including security requirements gathering, security risk assessment, security architecture and product reviews, security threat modeling, security scanning, code reviews, penetration testing, and bug bounty programs” (p. 9)

Key Mitigations We Apply:
“Data acquisition and preparation.

  • Detailed data lineage controls, including tracking the source, pre-processing steps, storage location, and access permissions
  • Supply chain controls for any third parties (e.g., data vendors or third-party data annotation)
  • Traditional just-in-time access controls, robust authentication, zero-trust rules, etc.
  • Data pre-processing (including cleaning, analysis, selection, etc.)
  • Re-sampling, re-weighting, and re-balancing datasets to reduce identified representation issues or
    imbalances” (p. 12)

“Training, evaluations and testing.

  • Multi-disciplinary red teaming
  • Independent third-party security testing, e.g., penetration testing
  • Continuous monitoring to detect anomalies and security issues
  • Multi-disciplinary red teaming
  • Consultation of domain experts
  • Multi-faceted evaluations, including standard benchmarks and proprietary evaluations based on identifi ed possible harms and harm reduction objectives
  • User research of local language
    and cultural contexts” (p. 13)

“Deployment and maintenance.

  • Blocklists, custom classifiers, and prompt injection guard filters, and human review to detect and intercept attempts to create unsafe outputs
  • Specific mitigations applied based on deployment type, e.g., isolated customer environments with focus on remediating security vulnerabilities that coexist
    between traditional application security and AI security
  • Security Information and Event Management (SIEM) system leveraging heuristics and advanced detection capabilities to identify potential threats
  • “Air-gapped”” safeguards to prevent lateral movement and unintended network calls across environments and kernel-based LLMs to prevent the leaking of shared memories or buffers that could expose sensitive data
  • Blocklists
  • Safety classifiers and human review to detect and intercept attempts to create unsafe outputs
  • Human-interpretable explanation of outputs
  • User research and customer feedback analysis” (p. 13)

“Improvement and further fine-tuning.

  • Responsible Disclosure Policy to incent third-party security vulnerability discovery
  • Specific mitigations applied based on deployment type, e.g., isolated customer environments with focus on remediating security vulnerabilities that coexist between traditional application security and AI security
  • Continuous evaluation and user research
  • Programs to incentivize research, including research grants and participation in external independent research efforts.
  • Multi-disciplinary red teaming” (p. 14)
3.1.1.2 Proof that containment measures are sufficient to meet the thresholds (40%) 10%

Whilst there is a process for determining weaknesses in containment measures with internal API testing, it is not clear that this is prior to their implementation, and this does not cover other aspects of containment, such as securing model weights. Further, to improve, they should detail proof for why they believe the containment measures proposed will be sufficient to meet the KCI threshold, in advance.

Quotes:

“Where applicable, we also consider risks within the context of customer deployments. For example, because many of our users start building applications through our application programming interfaces (APIs) before moving to more advanced deployments, we extensively test and secure our APIs. Our API V2 underwent a heavy security design review before we made it available.” (p. 10)

3.1.1.3 Strong third party verification process to verify that the containment measures meet the threshold (100% if 3.1.1.3 > [60% x 3.1.1.1 + 40% x 3.1.1.2]) 10%

Whilst there is a process for determining weaknesses in containment measures, it is not clear that this is prior to their implementation. To improve, they should detail a process for third-parties to verify the case for why they believe the containment measures proposed will be sufficient to meet the KCI threshold, in advance.

Quotes:

“Prior to deployment, significant model releases undergo an independent third-party penetration test to validate the security of containers and models.” (p. 10)

“Independent third-party security
testing, e.g., penetration testing” (p. 13)

3.1.2 Deployment measures (35%) 15%

3.1.2.1 Deployment measures are precisely defined for all KCI thresholds (60%) 25%

While deployment measures are defined, most if not all remain high-level (e.g., “human-interpretable explanation of outputs”, or “multi-disciplinary red teaming”.) To improve, more detail on the measures actually implemented or planned to be implemented should be given. Further, the measures should be tied to specific KCI thresholds.

Quotes:

Key Mitigations We Apply:
“Data acquisition and preparation.

  • Detailed data lineage controls, including tracking the source, pre-processing steps, storage location, and access permissions
  • Supply chain controls for any third parties (e.g., data vendors or third-party data annotation)
  • Traditional just-in-time access controls, robust authentication, zero-trust rules, etc.
  • Data pre-processing (including cleaning, analysis, selection, etc.)
  • Re-sampling, re-weighting, and re-balancing datasets to reduce identified representation issues or
    imbalances” (p. 12)

“Training, evaluations and testing.

  • Multi-disciplinary red teaming
  • Independent third-party security testing, e.g., penetration testing
  • Continuous monitoring to detect anomalies and security issues
  • Multi-disciplinary red teaming
  • Consultation of domain experts
  • Multi-faceted evaluations, including standard benchmarks and proprietary evaluations based on identifi ed possible harms and harm reduction objectives
  • User research of local language
    and cultural contexts” (p. 13)

“Deployment and maintenance.

  • Blocklists, custom classifiers, and prompt injection guard filters, and human review to detect and intercept attempts to create unsafe outputs
  • Specific mitigations applied based on deployment type, e.g., isolated customer environments with focus on remediating security vulnerabilities that coexist
    between traditional application security and AI security
  • Security Information and Event Management (SIEM) system leveraging heuristics and advanced detection capabilities to identify potential threats
  • “Air-gapped”” safeguards to prevent lateral movement and unintended network calls across environments and kernel-based LLMs to prevent the leaking of shared memories or buffers that could expose sensitive data
  • Blocklists
  • Safety classifiers and human review to detect and intercept attempts to create unsafe outputs
  • Human-interpretable explanation of outputs
  • User research and customer feedback analysis” (p. 13)

“Improvement and further fine-tuning.

  • Responsible Disclosure Policy to incent third-party security vulnerability discovery
  • Specific mitigations applied based on deployment type, e.g., isolated customer environments with focus on remediating security vulnerabilities that coexist between traditional application security and AI security
  • Continuous evaluation and user research
  • Programs to incentivize research, including research grants and participation in external independent research efforts.
  • Multi-disciplinary red teaming” (p. 14)
3.1.2.2 Proof that deployment measures are sufficient to meet the thresholds (40%) 0%

No proof is provided that the deployment measures are sufficient to meet the deployment KCI thresholds, nor is there a process to solicit such proof.

Quotes:

No relevant quotes found.

3.1.2.3 Strong third party verification process to verify that the deployment measures meet the threshold (100% if 3.1.2.3 > [60% x 3.1.2.1 + 40% x 3.1.2.2]) 0%

There is no mention of third-party verification of deployment measures meeting the threshold.

Quotes:

No relevant quotes found.

3.1.3 Assurance processes (30%) 0%

3.1.3.1 Credible plans towards the development of assurance properties (40%) 0%

There is an explicit aversiveness to preparing for assurance processes in advance: “Cohere’s approach to risk assurance, and to determining when models and systems are suffi ciently safe and secure to be made available to our customers, is focused on risks that are known, measurable, or observable today.” Further, they note that “more work is needed to develop methods for assessing these types of threats more reliably” – to improve, the framework could set out a commitment to contribute to this research effort.

Quotes:

“One approach to risk assurance in the AI industry is focused on risks described as catastrophic or severe, such as capabilities related to radiological and nuclear weapons, autonomy, and self-replication. In this context, thresholds relating to these potential catastrophic risks are developed, and the approach described in safety frameworks is designed to assess risks that are speculated to arise when models attain specific capabilities, such as the ability to perform autonomous research or facilitate biorisk. The models are then deemed to present “unacceptable” levels of risk when certain capability levels are attained. While it is important to consider long-term, potential future risks associated with LLMs and the systems in which they are deployed, studies regarding the likelihood of these capabilities arising and leading to real-world harm are limited in their methodological maturity and transparency, often lacking clear theoretical threat models or developed empirical methods due to their nascency. For example, existing research into how LLMs may increase biorisks fails to account for entire risk chains beyond access to information, and does not systematically compare LLMs to other information access tools, such as the internet. More work is needed to develop methods for assessing these types of threats more reliably.” (pp. 14-15)

“Cohere’s approach to risk assurance, and to determining when models and systems are sufficiently safe and secure to be made available to our customers, is focused on risks that are known, measurable, or observable today” (p. 15)

 

3.1.3.2 Evidence that the assurance properties are enough to achieve their corresponding KCI thresholds (40%) 0%

There is no mention of providing evidence that the assurance processes are sufficient.

Quotes:

No relevant quotes found.

3.1.3.3 The underlying assumptions that are essential for their effective implementation and success are clearly outlined (20%) 0%

There is no mention of the underlying assumptions that are essential for the effective implementation and success of assurance processes.

Quotes:

No relevant quotes found.

3.2 Continuous Monitoring and Comparing Results with Pre-determined Thresholds (50%) 12%

3.2.1 Monitoring of KRIs (40%) 0%

3.2.1.1 Justification that elicitation methods used during the evaluations are comprehensive enough to match the elicitation efforts of potential threat actors (30%) 0%

There is no mention of elicitation methods being comprehensive enough to match elicitation efforts of potential threat actors. Elicitation techniques, such as fine-tuning or scaffolding, are not mentioned.

Quotes:

No relevant quotes found.

3.2.1.2 Evaluation frequency (25%) 0%

Whilst the framework mentions conducting evaluations “throughout the model development cycle”, more detail is not given. The frequency does not appear to be tied to the variation of effective computing power during training, or fixed time periods.

Quotes:

“As described above, Cohere conducts evaluations throughout the model development cycle, using both internal and external evaluation benchmarks.” (p. 16)

3.2.1.3 Description of how post-training enhancements are factored into capability assessments (15%) 0%

There is no description of how post-training enhancements are factored into capability assessments.

Quotes:

No relevant quotes found.

3.2.1.4 Vetting of protocols by third parties (15%) 0%

There is no mention of having the evaluation methodology vetted by third parties.

Quotes:

No relevant quotes found.

3.2.1.5 Replication of evaluations by third parties (15%) 0%

There is no mention of having the evaluation methodology vetted by third parties.

Quotes:

No relevant quotes found.

3.2.2 Monitoring of KCIs (40%) 13%

3.2.2.1 Detailed description of evaluation methodology and justification that KCI thresholds will not be crossed unnoticed (40%) 25%

There is a description of “continuous monitoring of our security controls using automated and manual techniques” and “various evaluations to ensure that models actually adhere to these guardrails.” However, more detail is needed on the exact methodology of this monitoring to ensure that the KCI threshold will not be crossed unnoticed. Monitoring should also explicitly be linked to the monitoring of KCI measures. To improve, they could build on their existing monitoring infrastructure which monitors for “malicious attempts to prompt our models for harmful outputs” to link directly to KRIs and KCIs that they’d like to monitor.

Quotes:

“We are also progressing work to further study models when in use and assess the real-world effectiveness of mitigations, while upholding stringent levels of privacy and confidentiality and benefiting from external expertise where appropriate.” (p. 8)

“Where applicable, we also consider risks within the context of customer deployments. For example, because many of our users start building applications through our application programming interfaces (APIs) before moving to more advanced deployments, we extensively test and secure our APIs. Our API V2 underwent a heavy security design review before we made it available.” (p. 10)

“Moreover, we identify risks across our broader technology stack and environment by performing continuous monitoring of our security controls using automated and manual techniques. Models are developed and deployed in broader computational environments, and effectively managing AI risks requires us to identify, assess, and mitigate information security threats or vulnerabilities that may arise in these environments.” (p. 6)

“Beyond simply offering these features, Cohere conducts various evaluations to ensure that models actually adhere to these guardrails.” (p. 11)

“Continuous monitoring to detect anomalies and security issues” (p. 13)

“Responsible Disclosure Policy to incent third-party security vulnerability discovery” (p. 14)

“Where Cohere has direct visibility into the use of its models during deployment, we use that visibility to monitor for malicious attempts to prompt our models for harmful outputs, revoking access from accounts that abuse our systems. Cohere partners closely with customers who deploy Cohere’s AI solutions privately or on third-party managed platforms to ensure that they understand and recognize their responsibility for implementing appropriate monitoring controls during deployment.”

3.2.2.2 Vetting of protocols by third parties (30%) 0%

There is no mention of KCIs protocols being vetted by third parties.

Quotes:

No relevant quotes found.

3.2.2.3 Replication of evaluations by third parties (30%) 10%

There is an indication that third parties conduct red teaming of containment KCI measures to ensure they meet the containment KCI threshold, but detail on process, expertise required and methods are not given, and conducting independent testing is still discretionary. To improve, there should also be a process for replicating / having safeguard red teaming conducted by third parties for deployment KCI measures.

Quotes:

“Cohere conducts multidisciplinary red teaming during both the model development phase and post-launch. These red teaming exercises may include independent external parties, such as NIST and Humane Intelligence, and are conducted based on realistic use cases to attempt to break the model’s ability to fulfill alignment on risk mitigation goals in order to elicit information about areas of improvement.” (p. 16)

3.2.3 Transparency of evaluation results (10%) 43%

3.2.3.1 Sharing of evaluation results with relevant stakeholders as appropriate (85%) 50%

There is a commitment to make public documentation of evaluation results. However, there is no commitment to notify government agencies if risk thresholds are exceeded. Further, there is not a commitment to make KCI assessments public.

Quotes:

“Documentation is a key aspect of our accountability to our customers, partners, relevant government agencies, and the wider public. To promote transparency about our practices, we:

  • Publish documentation regarding our models’ capabilities, evaluation results, confi gurable secure AI features, and model limitations for developers to safely and securely build AI systems using Cohere solutions. This includes model documentation, such as Cohere’s Usage Policy and Model Cards, and technical guides, such as Cohere’s LLM University. […]
  • Offer insights into our data management, security measures, and compliance through our Trust Center.” (pp. 17-18)
3.2.3.2 Commitment to non-interference with findings (15%) 0%

No commitment to permitting the reports, which detail the results of external evaluations (i.e. any KRI or KCI assessments conducted by third parties), to be written independently and without interference or suppression.

Quotes:

No relevant quotes found.

3.2.4 Monitoring for novel risks (10%) 25%

3.2.4.1 Identifying novel risks post-deployment: engages in some process (post deployment) explicitly for identifying novel risk domains or novel risk models within known risk domains (50%) 50%

Their monitoring mostly focuses on security vulnerabilities; nonetheless, they mention a process for performing “continuous monitoring” explicitly to “identify risks”. Whilst they may not be novel risk domains, it does suggest a willingness to detect novel threat models, detected via observation in the deployment context.

Quotes:

“Moreover, we identify risks across our broader technology stack and environment by performing continuous monitoring of our security controls using automated and manual techniques. Models are developed and deployed in broader computational environments, and effectively managing AI risks requires us to identify, assess, and mitigate information security threats or vulnerabilities that may arise in these environments.” (p. 6)

“Cohere partners closely with customers who deploy Cohere’s AI solutions privately or on third-party managed platforms to ensure that they understand and recognize their responsibility for implementing appropriate monitoring controls during deployment.” (p. 12)

3.2.4.2 Mechanism to incorporate novel risks identified post-deployment (50%) 0%

Apart from incidence response, there is no mechanism to incorporate risks identified post-deployment detailed.

Quotes:

No relevant quotes found.

Back to top

4.1 Decision-making (25%) 5%

4.1.1 The company has clearly defined risk owners for every key risk identified and tracked (25%) 10%

The framework specifies a delegation of authority for risk decisions, but to one executive only for all risks.

Quotes:

“The final authority to determine if our products are safe, secure, and ready to be made available to our customers is delegated by Cohere’s CEO to Cohere’s Chief Scientist.” (p. 15)

4.1.2 The company has a dedicated risk committee at the management level that meets regularly (25%) 0%

No mention of a management risk committee.

Quotes:

No relevant quotes found.

4.1.3 The company has defined protocols for how to make go/no-go decisions (25%) 10%

The framework includes rudimentary protocols for decision-making.

Quotes:

“This decision is made on the basis of final, multi-faceted evaluations and testing.” (p. 15)
“We consider models safe and secure to launch when our evaluations and tests demonstrate no significant regressions compared to our previously launched model versions, so that performance and security is maintained or improved for every new significant model version. This is Cohere’s bright line for determining when a model is “acceptable” from a risk management perspective and ready to be launched.” (p. 16)

4.1.4 The company has defined escalation procedures in case of incidents (25%) 0%

No mention of escalation procedures.

Quotes:

No relevant quotes found.

4.2. Advisory and Challenge (20%) 6%

4.2.1 The company has an executive risk officer with sufficient resources (16.7%) 25%

Not explicitly a risk officer, but the Chief Scientist seem to partly play this role.

Quotes:

“The final authority to determine if our products are safe, secure, and ready to be made available to our customers is delegated by Cohere’s CEO to Cohere’s Chief Scientist.” (p. 15)

4.2.2 The company has a committee advising management on decisions involving risk (16.7%) 0%

No mention of an advisory committee.

Quotes:

No relevant quotes found.

4.2.3 The company has an established system for tracking and monitoring risks (16.7%) 10%

The framework has a rudimentary mention of consistent review.

Quotes:

“Cohere consistently reviews state-of-the-art research and industry practice regarding the risks associated with AI, and uses this to determine our priorities.” (p. 6)

4.2.4 The company has designated people that can advise and challenge management on decisions involving risk (16.7%) 0%

No mention of people that challenge decisions.

Quotes:

No relevant quotes found.

4.2.5 The company has an established system for aggregating risk data and reporting on risk to senior management and the Board (16.7%) 0%

No mention of a system to aggregate and report risk data.

Quotes:

No relevant quotes found.

4.2.6 The company has an established central risk function (16.7%) 0%

No mention of a central risk function.

Quotes:

No relevant quotes found.

4.3 Audit (20%) 13%

4.3.1 The company has an internal audit function involved in AI governance (50%) 0%

No mention of an internal audit function.

Quotes:

No relevant quotes found.

4.3.2 The company involves external auditors (50%) 25%

The framework laudably specifies the independence of the external testers.

Quotes:

“Prior to major model releases, Cohere also performs robust vulnerability management testing, including independent third-party penetration testing of model containers.” (p. 16)
“These red teaming exercises may include independent external parties, such as NIST and Humane Intelligence.” (p. 16)

4.4 Oversight (20%) 0%

4.4.1 The Board of Directors of the company has a committee that provides oversight over all decisions involving risk (50%) 0%

No mention of a Board risk committee.

Quotes:

No relevant quotes found.

4.4.2 The company has other governing bodies outside of the Board of Directors that provide oversight over decisions (50%) 0%

No mention of any additional governance bodies.

Quotes:

No relevant quotes found.

4.5 Culture (10%) 7%

4.5.1 The company has a strong tone from the top (33.3%) 10%

The framework includes a brief mention of controls.

Quotes:

“At Cohere, we recognize that properly securing AI requires going beyond traditional controls.” (p. 8)

4.5.2 The company has a strong risk culture (33.3%) 0%

The framework states the existence of a security-first culture, but does not offer much detail.

Quotes:

“Cohere’s security-first culture drives how we work together to design, operate, continuously monitor, and secure both our internal environment (i.e., network, applications, endpoints, data, and personnel) and customer and partner deployments. (p. 8)

4.5.3 The company has a strong speak-up culture (33.3%) 0%

No mention of elements of speak-up culture.

Quotes:

No relevant quotes found.

4.6 Transparency (5%) 28%

4.6.1 The company reports externally on what their risks are (33.3%) 50%

The framework mentions which risks are in scope and includes a commitment to publish information regarding these risks.

Quotes:

“We place potential risks arising from LLM capabilities into one of two categories:

  1. Risks stemming from possible malicious use of foundation AI models, such as generating content to facilitate cybercrime or child sexual exploitation
  2. Risks stemming from possible harmful outputs in the ordinary, non-malicious use of foundation models, such as outputs that are inaccurate in a way that has a harmful impact on a person or a group” (p. 6)

“Documentation is a key aspect of our accountability to our customers, partners, relevant government agencies, and the wider public. To promote transparency about our practices, we: Publish documentation regarding our models’ capabilities, evaluation results, configurable secure AI features, and model limitations for developers to safely and securely build AI systems using Cohere solutions. This includes model documentation, such as Cohere’s Usage Policy and Model Cards, and technical guides, such as Cohere’s LLM University. (p. 17)

4.6.2 The company reports externally on what their governance structure looks like (33.3%) 10%

The framework includes rudimentary governance elements.

Quotes:

“The final authority to determine if our products are safe, secure, and ready to be made available to our customers is delegated by Cohere’s CEO to Cohere’s Chief Scientist.” (p. 15)

4.6.3 The company shares information with industry peers and government bodies (33.3%) 25%

The framework lists several external actors, but not specifically authorities.

Quotes:

“Cohere is committed to building a responsible, safe, and secure AI ecosystem, and actively engages with external actors to continuously improve our own practices, as well as to advance the state-of-the art on AI risk management. In particular, Cohere contributes to the development of critical guidance and industry standards with organisations such as: OWASP Top 10 for Large Language Models and Generative AI, CoSAI (Coalition for Secure AI) — founding member, CSA (Cloud Security Alliance), ML Commons”. (p. 19)
“Cohere also engages in cooperation with international AI Safety Institutes and external researchers to advance the scientific understanding of AI risks, for example by submitting our public models for inclusion on public benchmarks and red teaming exercises.” (p. 19)

Back to top