Keywords: AI assessment; AI interrogation; benchmarking; evaluation; taskable AI
TL;DR: We investigate why AI assessment requires more than an extrapolation of existing paradigms for verification and validation, and identify concrete desiderata and promising directions for research on formal assessment of AI systems.
Abstract: How should we assess the safety and functionality of taskable AI systems that are designed to continually learn and solve user-desired tasks in user-specific environments? From household robotics to digital assistants that can make potentially dangerous changes to their operational environments, this question is central to realizing the promise of AI. We investigate why answering this question requires more than an extrapolation of existing paradigms for verification and validation, and identify concrete desiderata and promising directions for research
on formal assessment of AI systems.
Paper Type: New Full Paper
Submission Number: 12
Loading