Domain-Agnostic Scalable AI Safety Ensuring Framework

Domain-Agnostic Scalable AI Safety Ensuring Framework

ICLR 2026 Conference Submission16418 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: safety, domain-agnostic, AI safety

TL;DR: We propose a domain-agnostic AI safety ensuring framework, along with a scaling law.

Abstract: AI safety has emerged as a critical priority as these systems are increasingly deployed in real-world applications. We propose **the first domain-agnostic AI safety ensuring framework** that achieves strong safety guarantees while preserving high performance, grounded in rigorous theoretical foundations. Our framework includes: (1) an optimization component with chance constraints, (2) a safety classification model, (3) internal test data, (4) conservative testing procedures, (5) $\zeta$-informative dataset quality measures, and (6) continuous approximate loss functions with gradient computation. Furthermore, to our knowledge, we mathematically establish **the first scaling law in AI safety research**, relating data quantity to safety-performance trade-offs. Experiments across *reinforcement learning, natural language generation*, and *production planning* validate our framework and demonstrate superior performance. Notably, in reinforcement learning, we achieve **3 collisions during 10M actions, compared with 1,000 - 3,000 for PPO-Lag baselines** at equivalent performance levels---a safety level unattainable by previous AI methods. We believe our framework opens a new foundation for safe AI deployment across safety-critical domains.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 16418

Loading