Safety Beyond Verification: The Need for Continual, User-Driven Assessment of AI Systems

Published: 15 Jun 2025, Last Modified: 07 Aug 2025AIA 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI assessment; AI interrogation; benchmarking; evaluation; taskable AI
TL;DR: We investigate why AI assessment requires more than an extrapolation of existing paradigms for verification and validation, and identify concrete desiderata and promising directions for research on formal assessment of AI systems.
Abstract: How should we assess the safety and functionality of taskable AI systems that are designed to continually learn and solve user-desired tasks in user-specific environments? From household robotics to digital assistants that can make potentially dangerous changes to their operational environments, this question is central to realizing the promise of AI. We investigate why answering this question requires more than an extrapolation of existing paradigms for verification and validation, and identify concrete desiderata and promising directions for research on formal assessment of AI systems.
Paper Type: New Full Paper
Submission Number: 12
Loading