Keywords: Agents, Multi-turn interaction, risk amplification, Markov Decision Process
Abstract: The safety of AI agents in multi-turn interaction is a growing concern, particularly as agent behavior may vary over time due to the dynamic nature of both the agent and its environment. We introduce the concept of “state-induced risk amplification”, hypothesizing that extended AI-environment interaction can lead to agent actions that transition the system into risky states, and that such transitions may increase the likelihood of risky actions by the agent. We provide a formal characterization of these effects using the Markov decision process framework. To empirically test our hypotheses, we introduce a novel experimental setup inspired by traffic monitoring applications. Our results demonstrate the practical occurrence of state-induced risk amplification, highlighting an emerging safety risk for current multi-turn agents and calling for safety evaluation methods that account for state-dependent dynamics. We discuss implications for designing adaptive risk mitigation strategies.
Submission Number: 57
Loading