How is Occam's Razor Realized in Symbolic Regression?: An Adaptive LLM-Enhanced Genetic Programming Approach for Efficient, Versatile, and Interpretable Representation Discovery through Simplification and Evolution
Keywords: Symbolic Regression, Large Language Models (LLMs), Genetic Programming, Adaptive Scheduling
TL;DR: ALEGP: An adaptive LLM-enhanced genetic programming approach that dynamically integrates large language models with multi-island evolutionary search to address bloating, premature convergence, and local optima in symbolic regression.
Abstract: Symbolic regression aims to discover mathematical expressions that capture underlying data relationships, but genetic programming (GP) approaches commonly encounter bloat, premature convergence, and inadequate expression simplification mechanisms. We propose ALEGP (Adaptive LLM-Enhanced Genetic Programming), a framework that strategically integrates large language models (LLMs) with evolutionary computation to address these interconnected challenges.
ALEGP incorporates three key components: (i) a multi-island evolutionary architecture employing specialized subpopulations with distinct optimization objectives to maintain population diversity, (ii) a context-aware intervention scheduler that triggers LLM assistance based on real-time evolutionary indicators including fitness stagnation, diversity loss, and expression bloat, and (iii) an island-specific integration protocol that reincorporates LLM-refined expressions while preserving beneficial evolutionary dynamics. This design enables targeted simplification of complex expressions, improved generalization performance, and reduced computational overhead through adaptive LLM utilization.
Experiments on eight synthetic benchmark functions and five real-world regression datasets demonstrate that ALEGP achieves superior accuracy and interpretability while requiring 50–60\% fewer LLM interventions than fixed-schedule strategies. Ablation studies validate the necessity of both adaptive scheduling and multi-island design for robust performance. These results establish ALEGP as an effective framework for resource-efficient symbolic regression, demonstrating principled integration of evolutionary algorithms with large language models. Code is provided as supplementary material.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 10284
Loading