LLM-based Symbolic Regression with Tool-Augmented Multi-Objective Optimization

Boxiao Wang; Runxiang Wang; Kai Li; Yifan Zhang; Jian Cheng

LLM-based Symbolic Regression with Tool-Augmented Multi-Objective Optimization

Boxiao Wang, Runxiang Wang, Kai Li, Yifan Zhang, Jian Cheng

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Scientific Equation Discovery, Symbolic Regression, LLM

Abstract: Symbolic Regression (SR) aims to discover analytical equations from observational data and plays a central role in scientific modeling. While recent Large Language Model (LLM) based approaches show promise, they face two key limitations. First, they lack dedicated data analysis mechanisms to uncover variable dependencies, which reduces the efficiency of equation discovery. Second, most methods rely on single-objective evaluation focused solely on fitting error. This neglect of structural complexity and generalization often causes models to converge prematurely to local optima, limiting their ability to explore the broader equation space. To address these issues, we propose Tool-Augmented Multi-Objective Symbolic Regression (TAMOSR), a unified framework that integrates external analytical tools (e.g., correlation analysis, mutual information, periodicity detection) to extract structural priors and guide equation generation, while simultaneously optimizing for accuracy, complexity, and generalization via a multi-objective evaluation module with a dynamic Pareto front. TAMOSR employs two collaborative LLM modules: a Meta Strategy Generator, which selects tools and synthesizes structural optimization strategies based on Pareto-optimal equations, and an Equation Generator, which produces new candidate equations accordingly. The system operates in a closed loop, continuously refining both strategies and equation structures. Experiments on diverse scientific benchmarks demonstrate that TAMOSR outperforms existing SR methods in accuracy, generalization, and search efficiency, offering a scalable and adaptable paradigm for scientific discovery.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 8147

Loading