Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: LLM agents, algorithmic discovery, distribution-aware program learning, solver synthesis, combinatorial optimization, AI scientists
Abstract: We study a controlled setting for evaluating LLM agents as algorithmic discovery systems. Given only samples from an unknown structured distribution of problem instances, the agent must infer reusable structure, operationalize it through analysis code, and compile it into an executable solver for future instances from the same distribution. This setting isolates a basic scientific workflow: hypothesize a latent regularity, measure evidence for it, turn the result into a procedure, and validate the procedure on held-out cases.
We formalize this process through \emph{distribution-aware program learning}. The learned object is not a predictor but solver code, and the objective is not only solution quality but also deployment runtime. Our central abstraction is a \emph{solver hint}: reusable structure inferred from samples and compiled into a specialized algorithm. We prove that, for fixed solver libraries, the empirically fastest sample-consistent solver generalizes in both correctness and runtime, and that statistically identifiable hints can be recovered from polynomially many samples. A hidden SAT-backdoor model illustrates how learned structure can yield exponential per-instance speedups while preserving correctness through fallback to a complete solver.
Empirically, we instantiate the framework with LLM code agents on $21$ structured combinatorial-optimization target distributions across seven problem classes. Each candidate consists of a natural-language hypothesis, an analysis program that extracts a compact hint from public samples, and a deployment solver conditioned on that hint. The synthesized solvers reach mean normalized quality $0.970$, improving by $+0.143$ over the average heuristic pool and $+0.051$ over the highest-quality heuristic. Using geometric means of
per-target runtime ratios, they run $24.7\times$ faster than the quality-best heuristic, $194.7\times$ faster than Gurobi, and $9.5\times$ faster than the selected time-limited exact backend. These results suggest that LLM agents can sometimes act as distribution-aware algorithm designers, converting sampled regularities into reusable computational procedures rather than merely producing faster implementations of generic search.
Submission Number: 295
Loading