Position: Safe AI Should be Resistant and Resilient in an Evolving World

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 Position Paper Track regularEveryoneRevisionsBibTeXCC BY-NC 4.0
Abstract: In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into "Make AI Safe", which applies post-hoc alignment and guardrails but remains brittle and reactive, and "Make Safe AI", which emphasizes intrinsic safety but struggles to address unforeseen risks in open-ended environments. We therefore propose safe-by-coevolution as a new formulation of the "Make Safe AI" paradigm, inspired by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning process. To operationalize this vision, we introduce R$^2$AI---Resistant and Resilient AI---as a practical framework that unites resistance against known threats with resilience to unforeseen risks. R$^2$AI integrates fast and slow safe models, adversarial simulation and verification through a safety wind tunnel, and continual feedback loops that guide safety and capability to coevolve. We argue that this framework offers a scalable and proactive path to maintain continual safety in dynamic environments, addressing both near-term vulnerabilities and long-term existential risks as AI advances toward AGI and ASI.
Lay Summary: In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into "Make AI Safe", which applies post-hoc alignment and guardrails but remains brittle and reactive, and "Make Safe AI", which emphasizes intrinsic safety but struggles to address unforeseen risks in open-ended environments. We therefore propose safe-by-coevolution as a new formulation of the "Make Safe AI" paradigm, inspired by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning process. To operationalize this vision, we introduce R$^2$AI---Resistant and Resilient AI---as a practical framework that unites resistance against known threats with resilience to unforeseen risks. R$^2$AI integrates fast and slow safe models, adversarial simulation and verification through a safety wind tunnel, and continual feedback loops that guide safety and capability to coevolve. We argue that this framework offers a scalable and proactive path to maintain continual safety in dynamic environments, addressing both near-term vulnerabilities and long-term existential risks as AI advances toward AGI and ASI.
Primary Area: System Risks, Safety, and Government Policy
Keywords: AI Safety, Trustworthy AI, Risk
Originally Submitted PDF: pdf
Submission Number: 912
Loading