The Novelty Ceiling: PAC-Theoretic Bounds on Autonomous Scientific Discovery and the Minimum Oversight Rate

Siddharth Karuturi; Kaustubh S. Bukkapatnam; Laksh Patel; Tanush Ajay Shastry

The Novelty Ceiling: PAC-Theoretic Bounds on Autonomous Scientific Discovery and the Minimum Oversight Rate

Siddharth Karuturi, Kaustubh S. Bukkapatnam, Laksh Patel, Tanush Ajay Shastry

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Track 1: Original Research/Position/Education/Attention Track

TL;DR: We use PAC learning theory to prove that autonomous AI scientists are inherently bounded by a training-data-defined "novelty ceiling," and we derive the exact minimum human oversight rate required to achieve genuinely novel scientific discoveries.

Abstract: Autonomous AI scientists that both generate and evaluate hypotheses in closed loops are increasingly deployed across the natural sciences. We demonstrate, both theoretically and empirically, that such systems are fundamentally bounded by what we call the \textit{novelty ceiling}: a hard limit on the structural distance from the training corpus beyond which the learned evaluator provides no reliable signal. Using PAC learning theory, we prove that the ceiling is determined by corpus diameter and VC-dimension---not by model scale or runtime---and that without human intervention the self-improvement loop converges to generating hypotheses within this ceiling at rate $O(1/T)$. We derive a closed-form minimum oversight rate $r^*$, the fraction of hypotheses that must be routed to a human expert to maintain a target rate of genuinely novel discoveries. We further prove that injecting structurally diverse diversity seeds raises the ceiling and reduces $r^*$ exponentially in the seed count, establishing a formal substitution rate between curated data investment and live human effort. Finally, we show that novelty-triggered oversight strictly dominates random and uncertainty-triggered oversight at any fixed budget. Experiments on symbolic regression over the Feynman benchmark and a drug--target interaction loop corroborate all theoretical predictions, with empirical ceilings consistently within 8\% of our analytical bound. Our results provide the first principled, computable answer to when AI scientists function as tools, co-authors, or require human oversight to produce founder-level discoveries.

Keywords: Autonomous Scientific Discovery, PAC Learning Theory, Novelty Ceiling, Human Oversight, Out-of-Distribution Generalization, AI Scientists

Submission Number: 189

Loading