On the Statistical Limits of Self-Improving Agents

On the Statistical Limits of Self-Improving Agents

29 Mar 2026 (modified: 21 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We develop a learning-theoretic framework for analyzing self-improving agents by decomposing self-modification into five axes. Within this framework, we prove a sharp boundary: under standard i.i.d. assumptions, distribution-free PAC learnability is preserved if and only if the policy-reachable family remains uniformly capacity-bounded. If reachable capacity can grow without bound, utility-rational self-changes can make learnable tasks unlearnable. We further introduce a simple Two-Gate guardrail—a validation-improvement requirement plus a capacity cap—that preserves this boundary and yields standard VC-rate guarantees. The broader implication is that self-modification must be constrained not only by objectives, but also by structural conditions that preserve the statistical prerequisites for learning. As AI systems become increasing intelligent and autonomous, we view this framework as an important foundation for the statistical theory of self-improvement.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Mirco_Mutti1

Submission Number: 8163

Loading