AI Safeguards as Affordance Modulation: Embedded Population Assumptions in Agentic Systems

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI safety, affordance theory, agentic AI, safeguards
TL;DR: AI safety treats safeguards as model properties; we argue they are relational, and agentic deployment categorically violates the human-calibrated configuration current safeguards assume.
Abstract: AI safeguards are evaluated against fixed adversaries and reported as passing or failing. We argue this practice rests on a category error: safeguards are not properties of models but properties of the relationship between models, agents, and environments. We build a framework for analysing and designing safeguards as relational interventions, grounded in Davis's (2020) Mechanisms and Conditions framework. The framework consists of three components: (i) the embedded population assumption (EPA) as the unit of analysis, specifying the agent–environment configuration against which a safeguard was calibrated; (ii) a typology of three modulation classes (suppressive, frictional, allocative) derived from Davis's conditions axis, each identifying a dominant relational variable and a characteristic failure mode; and (iii) a stability condition determining when safeguards hold under distributional shift. We demonstrate the framework by showing that agentic deployment categorically violates the human-calibrated EPA underlying current safeguards, and extract design principles that reorient safeguard development from point-hardening against fixed adversaries to distributional robustness across agent populations.
Track: Short Paper (4 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 280
Loading