AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM‑Based Agents

Emma Gouné; Akshat Naik; Patrick Quinn; Guillermo Bosch; Francisco Javier Campos Zabala; Jason Ross Brown; Edward James Young

AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM‑Based Agents

Emma Gouné, Akshat Naik, Patrick Quinn, Guillermo Bosch, Francisco Javier Campos Zabala, Jason Ross Brown, Edward James Young

Published: 01 Mar 2026, Last Modified: 24 Apr 2026ICLR 2026 AIWILDEveryoneRevisionsCC BY 4.0

Keywords: AI Safety, AI Alignment, Model Evaluation, Sandbagging, LLM Agents, Inspect, Misalignment, AI Agents, Benchmarking

TL;DR: We develop a suite of evaluations to measure the propensity of LLM agents to perform misaligned actions in real-world settings.

Abstract: As Large Language Model (LLM) agents become more widespread, associated misalignment risks increase. While prior research has studied agents' ability to produce harmful outputs or follow malicious instructions, it remains unclear how likely agents are to spontaneously pursue unintended goals in realistic deployments. To address this gap, we define a new class of alignment failures, called intent misalignment, where agents spontaneously pursue goals that diverge from deployer intentions, distinct from adversarial prompting or capability elicitation. We then introduce AgentMisalignment, a benchmark of nine evaluations measuring behavioral propensity for intent misalignment in authentic deployment contexts. Key findings reveal that intent misalignment correlates with model size and that personality conditioning exposes widespread vulnerabilities. By evaluating complete behavioral traces through the intent misalignment lens, our benchmark uncovers failure patterns invisible to standard capability testing.

PDF: pdf

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 111

Loading