AI4SLT: Empirical Processes in Lean 4 for Formal Statistical Learning Theory

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: The first comprehensive Lean 4 formalization of statistical learning theory, featuring Gaussian Lipschitz concentration and Dudley's entropy integral—establishes a reusable foundation for formalizing ML theory.
Abstract: We present the first comprehensive Lean 4 formalization of statistical learning theory (SLT) grounded in empirical process theory. Our end-to-end formal infrastructure implement the missing contents in latest Lean library, including a complete development of Gaussian Lipschitz concentration, Dudley’s entropy integral theorem for sub-Gaussian processes, and an application to least-squares (sparse) regression with a sharp rate. The project was carried out using a human-AI collaborative workflow, in which humans design proof strategies and AI agents execute tactical proof construction, leading to the human-verified Lean 4 toolbox for SLT. Beyond implementation, the formalization process exposes and resolves implicit assumptions and missing details in standard SLT textbooks, enforcing a granular, line-by-line understanding of the theory. This work establishes a reusable formal foundation and opens the door for future developments in machine learning theory. The code is provided in https://github.com/YuanheZ/lean-stat-learning-theory.
Lay Summary: Machine learning systems are used in many important settings, but the mathematics explaining why they work can be extremely long and hard to check. In this work, we translate a major part of statistical learning theory into Lean 4, a computer system that checks mathematical proofs line by line. Our project, AI4SLT, builds new Lean tools for important probability arguments, including Gaussian concentration and Dudley’s entropy integral, which help turn messy data fluctuations into reliable prediction guarantees. The formalization is not just a computer copy of textbook proofs, it exposes assumptions that textbooks often leave implicit. We complete the project through human-AI collaboration, where humans planned the mathematics and AI agents helped write detailed Lean proofs. The delivery is a reusable, human-verified toolbox that can help researchers check, teach, and extend machine learning theory more reliably.
Originally Submitted Supplementary Material: zip
Link To Code: https://github.com/YuanheZ/lean-stat-learning-theory
Primary Area: Theory->Learning Theory
Keywords: Lean 4 Formalization, Statistical Learning Theory, Empirical Process Theory, Autoformalization
Originally Submitted PDF: pdf
Submission Number: 26915
Loading