Introducing v0.5 of the AI Safety Benchmark from MLCommons

Bertie Vidgen, Sean McGregor, Peter Mattson, Joaquin Vanschoren

Published: 13 May 2024, Last Modified: 05 May 2025arXivEveryoneCC BY-SA 4.0

Abstract: The AI Safety Benchmark v0.5 has been created by the MLCommons AI Safety Working Group (WG), a consortium of industry and academic researchers, engineers, and practitioners. The primary goal of the WG is to advance the state of the art for evaluating AI safety. We hope to facilitate better AI safety processes and stimulate AI safety innovation across industry and research. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users).1 We created a new taxonomy of 13 hazard categories, of which seven have tests in the v0.5 benchmark. We plan to release v1.0 of the AI Safety Benchmark by the end of 2024, which will provide meaningful insights into the safety of AI systems. The v0.5 benchmark is preliminary and should not be used to assess the safety of AI systems. We have released it only to outline our approach to benchmarking, and to solicit feedback. For this reason, all the models we tested have been anonymized. We have sought to fully document the limitations, flaws, and challenges of the v0.5 benchmark in this paper, and we are actively looking for input from the community.