Scaling Up Mischief: Red-Teaming AI and Distributing Governance

Jacob Metcalf, Ranjit Singh

Published: 13 Dec 2023, Last Modified: 22 Dec 2025Harvard Data Science ReviewEveryoneRevisionsCC BY-SA 4.0

Abstract: Red-teaming is an emergent strategy for governing large language models (LLMs), which borrows heavily from cybersecurity methods. Policymakers and developers alike have leaned heavily into this promising, yet largely unvalidated approach for regulating generative AI. We argue that AI red-teaming efforts address a particular and unique moderation need of LLM developers: scaling up human mischievousness by inviting a wide diversity of people to make the system misbehave in unsafe or dangerous ways. However, there are significant methodological challenges in connecting the practices of AI red-teaming to the broad range of AI harms that policymakers intend it to address. Caution is warranted as policymakers and developers invest significant resources into AI red-teaming.

External IDs:doi:10.1162/99608f92.ff6335af