Track: Track 1: Original Research/Position/Education/Attention Track
Abstract: AI scientists are moving from assistive tools toward systems that can generate hypotheses, orchestrate tools, interpret intermediate evidence, and in some domains close the loop with experimental execution. Existing evaluations increasingly measure capability, but capability alone does not tell us when one concrete AI scientist run should count as trustworthy enough to inspect, replay, or authorize. In autonomous laboratories, that distinction matters because planning, sensing, tool use, coordination, and irreversible actions are coupled at the level of a single run. We introduce MADS-CPS, a machine-checkable run-level admissibility contract that specifies a declared assurance envelope, a conformance checker over required artifacts and replay status, verification modes for restricted auditability, and fail-closed point-of-no-return release gating. We instantiate the framework in a robot-centric autonomous-laboratory profile and evaluate it through an eight-case conformance challenge corpus, a verification-mode admissibility matrix, an independent replay-link experiment, and a controller-matrix study spanning baseline and stressed regimes. Across these studies, MADS-CPS achieves perfect checker agreement on injected faults, strong replay match rate 1.00 in the reported E3 settings, raw controller invariance in baseline settings, normalized-interface invariance in baseline and moderate settings, and interpretable controller divergence under harder coordination stress. These results suggest that run-level admissibility can remain machine-checkable even when productivity and controller behavior separate under stress.
Keywords: AI for Science, AI Scientists, Autonomous Laboratories, Machine-Checkable Assurance, Run-Level Admissibility, Auditability, Replay, Cyber-Physical Systems
Submission Number: 185
Loading