Safety Devolution in AI Agents

Cheng Yu; Orestis Papakyriakopoulos

Safety Devolution in AI Agents

Cheng Yu, Orestis Papakyriakopoulos

Published: 06 Mar 2025, Last Modified: 27 Mar 2025ICLR-25 HAIC WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Track: tiny / short paper (up to 5 pages)

Keywords: Ethical AI, LLM Agents, RAG, Safety Devolution

Abstract: As retrieval-augmented AI agents become more embedded in society, their safety properties and ethical behavior remain insufficiently understood. In particular, the growing integration of LLMs and AI agents raises critical questions about how they engage with and are influenced by social environments. This study investigates how expanding retrieval access—from no external sources to Wikipedia-based retrieval and open web search—affects model reliability, bias propagation, and harmful content generation. Through extensive benchmarking of censored and uncensored LLMs, we observe a phenomenon we term safety devolution: increased web access correlates with declining response rates, reduced refusal of unsafe prompts, and amplified bias. Notably, even ethically aligned LLMs exhibit safety devolution when granted unrestricted web retrieval, performing comparably to uncensored models. These findings underscore the need for robust mitigation strategies to ensure fairness and reliability in retrieval-augmented and increasingly autonomous AI systems.

Submission Number: 18

Loading