LLM-ERM: Sample-Efficient Program Learning via LLM-Guided Search

Shivam Singhal; Eran Malach; Tomaso Poggio; Tomer Galanti

LLM-ERM: Sample-Efficient Program Learning via LLM-Guided Search

Shivam Singhal, Eran Malach, Tomaso Poggio, Tomer Galanti

18 Sept 2025 (modified: 20 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: sample complexity, large language models, empirical risk minimization, program synthesis

Abstract: We seek algorithms for program learning that are both sample-efficient and computationally feasible. In the realizable short-program regime, length-first (Occam/MDL) enumeration achieves near-optimal PAC rates—if the target has a length-$L$ description over alphabet $\Sigma$, finite-class ERM requires only $\mathcal{O}(L\log|\Sigma|/\epsilon)$ samples—but naïve length-first enumeration is computationally infeasible. In contrast, stochastic gradient descent (SGD) is computationally practical yet sample-inefficient. Under the statistical query (SQ) framework, iteration/sample lower bounds scale with SQ dimension, implying exponential data requirements for parities and related families even for short target programs. To address this gap, we introduce LLM-ERM, a propose-and-verify framework that replaces exhaustive enumeration with an LLM-guided search over candidate programs while retaining ERM-style selection on held-out data. Specifically, we draw $k$ candidates with a pretrained reasoning-augmented LLM, compile and check each on the data, and return the best verified hypothesis, with no feedback, adaptivity, or gradients. Theoretically, we formalize how SQ hardness transfers to SGD iteration complexity on high-SQ-dimension classes. {\em Empirically, LLM-ERM solves tasks such as parity variants, pattern matching, and primality testing with as few as 200 samples, while SGD-trained transformers overfit even with 100,000 samples}. These results indicate that language-guided program synthesis recovers much of the statistical efficiency of finite-class ERM while remaining computationally tractable, offering a practical route to learning succinct hypotheses beyond the reach of gradient-based training.

Supplementary Material: zip

Primary Area: learning theory

Submission Number: 10446

Loading