Position: AI Safety Must Embrace an Antifragile Perspective

Ming Jin; Hyunin Lee

Position: AI Safety Must Embrace an Antifragile Perspective

Ming Jin, Hyunin Lee

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 Position Paper Track posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This paper promotes an antifragile approach to AI safety, highlighting the need for systems to evolve and enhance their handling of unexpected events beyond static tests, ensuring long-term AI safety.

Abstract:

This position paper contends that modern AI research must adopt an antifragile perspective on safety---one in which the system's capacity to handle rare or out-of-distribution (OOD) events adapts and expands over repeated exposures. Conventional static benchmarks and single-shot robustness tests overlook the reality that environments evolve and that models, if left unchallenged, can drift into maladaptation (e.g., reward hacking, over-optimization, or atrophy of broader capabilities). We argue that an antifragile approach---Rather than striving to rapidly reduce current uncertainties, the emphasis is on leveraging those uncertainties to better prepare for potentially greater, more unpredictable uncertainties in the future---is pivotal for the long-term reliability of open-ended ML systems. In this position paper, we first identify key limitations of static testing, including scenario diversity, reward hacking, and over-alignment. We then explore the potential of dynamic, antifragile solutions to manage rare events. Crucially, we advocate for a fundamental recalibration of the methods used to measure, benchmark, and continually improve AI safety over the long term, complementing existing robustness approaches by providing ethical and practical guidelines towards fostering an antifragile AI safety community.

Lay Summary:

Problem: Current AI safety approaches test systems once and declare them robust, but real-world environments constantly evolve with new threats, such as new attack methods, unexpected user behaviors, and environmental changes that weren't anticipated during development.

Solution: We propose ``antifragile'' AI safety, inspired by biological immune systems that get stronger after exposure to threats. Instead of hoping our initial safety tests cover everything, we design AI systems that continuously learn from new failures and stress-test themselves in safe environments. When a system encounters an unexpected problem, it doesn't just patch that specific issue---it uses the experience to become more robust against similar future threats.

Impact: This approach could prevent catastrophic AI failures by ensuring systems improve from every new challenge they encounter, rather than becoming brittle over time. Instead of playing an endless game of whack-a-mole with new vulnerabilities, we can build AI that evolves to handle tomorrow's unknown threats. This is crucial as AI systems become more powerful and are deployed in critical areas like healthcare, infrastructure, and finance where unexpected failures could have severe consequences.

Verify Author Names: My co-authors have confirmed that their names are spelled correctly both on OpenReview and in the camera-ready PDF. (If needed, please update ‘Preferred Name’ in OpenReview to match the PDF.)

No Additional Revisions: I understand that after the May 29 deadline, the camera-ready submission cannot be revised before the conference. I have verified with all authors that they approve of this version.

Pdf Appendices: My camera-ready PDF file contains both the main text (not exceeding the page limits) and all appendices that I wish to include. I understand that any other supplementary material (e.g., separate files previously uploaded to OpenReview) will not be visible in the PMLR proceedings.

Latest Style File: I have compiled the camera ready paper with the latest ICML2025 style files <https://media.icml.cc/Conferences/ICML2025/Styles/icml2025.zip> and the compiled PDF includes an unnumbered Impact Statement section.

Paper Verification Code: Y2IxN

Permissions Form: pdf

Primary Area: System Risks, Safety, and Government Policy

Keywords: AI Safety, Robustness, Antifragility, LLM, Out-Of-Distribution

Submission Number: 390

Loading