Behavioral Code: Legible, Auditable Loops for Autonomous Agents

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI agents, autonomous agents, AI safety, agent oversight, interpretability, auditability, guardrails, provenance, automated research, software agents
TL;DR: We introduce behavioral code, a reviewable and enforceable way to make autonomous agent loops legible, auditable, and grounded in explicit predictions, provenance, and execution traces.
Abstract: Autonomous research agents can now run unattended for hours, producing code changes, logs, and metrics, yet leave no readable account of what situations they recognized, why they acted, or what they expected to happen next. In many current systems, the closest thing to a behavioral specification is a prompt plus a tool list: reviewers can inspect the scaffold and the execution trace, but not a structured account of the agent's intended behavior. We introduce behavioral code: a reviewable program of named concepts and reactions that makes an agent loop's behavior explicit. Concepts describe human-meaningful situations in the loop, while reactions specify how the agent should respond. Each reaction cites the events that triggered it and records predictions before outcomes are observed. This lets later successes and failures be compared against explicit behavioral commitments, rather than explained after the fact. We pair this representation with a lightweight harness that enforces declared commitments against the execution trace. We demonstrate the system in two case studies: a language-model training loop and an exploratory C++ performance loop. Across both settings, the concepts make the loop legible; the harness makes its execution auditable.
Track: Short Paper (4 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 213
Loading