EVIL Baseline: LLM-Discovered Heuristics can be strong Baselines for Scientific Inference

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 1: Original Research/Position/Education/Attention Track
Abstract: We explore whether LLM-guided evolutionary search can automatically discover simple, interpretable algorithms that serve as strong baselines for scientific inference. Using EVIL (\textbf{EV}olving \textbf{I}nterpretable algorithms with \textbf{L}LMs), we evolve compact Python/NumPy programs that perform zero-shot inference on dynamical systems. Across three scientific inference tasks (temporal point process prediction, Markov jump process rate estimation, and time series imputation), the LLM-discovered heuristics are surprisingly competitive with state-of-the-art deep learning models, while being orders of magnitude faster, fully interpretable, and discoverable in minutes for under $1 of API cost. These results provide a humbling reminder that the added complexity of sophisticated models is not always necessary, and suggest that LLM-guided program search can serve as a useful tool for establishing strong baselines that help calibrate where complex approaches genuinely add value.
Keywords: LLM-Guided Program Evolution, Scientific Inference, Interpretable Machine Learning, Automated Algorithm Discovery, Dynamical Systems, Zero-Shot Generalization
Submission Number: 7
Loading