Finetuning Large Language Model as an Effective Symbolic Regressor

Yingfan Hua; Ruikun Li; Jun Yao; Guohang Zhuang; SHIXIANG TANG; Bin Liu; Wanli Ouyang; Yan Lu

Finetuning Large Language Model as an Effective Symbolic Regressor

Yingfan Hua, Ruikun Li, Jun Yao, Guohang Zhuang, SHIXIANG TANG, Bin Liu, Wanli Ouyang, Yan Lu

09 Sept 2025 (modified: 27 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Benchmark, Scientific Discovery, Large Language Models, Symbolic Regression

Abstract: Deriving governing equations from observational data, known as Symbolic Regression (SR), is a cornerstone of scientific discovery. Large Language Models (LLMs) have shown promise in this task by leveraging their vast cross-disciplinary scientific knowledge. However, existing LLM-based methods primarily rely on direct inference or prompt engineering, often requiring excessive inference iterations to converge on correct formulas or failing to treating complex equation targets. These limitations in effectiveness and generalization stem from an inherent tension between pre-trained LLMs' proficiency in approximate reasoning and the high-precision demands of SR tasks. The underlying reason for this stems from a fundamental mismatch between the general-purpose pre-training of LLMs and the specialized nature of symbolic regression, a problem exacerbated by the scarcity of high-quality, task-specific data. To bridge this gap, we propose to fine-tune LLMs for enhanced SR capability. Yet, the absence of dedicated datasets for SR-oriented fine-tuning remains a critical barrier. We thus introduce SymbArena, specifically engineered to optimize LLMs for SR. This benchmark comprises 148,102 diverse equations formulated as corpora of 1.83 billion tokens for LLM utilization, enabling effective training and inference. Further, SymbArena proposes a heuristics metric to precisely quantify form-level consistency, going beyond existing SR numerical-oriented evaluation strategies. %is designed for the rigorous evaluation of both LLM-based and traditional SR methods and includes a novel, graded, and interpretable metric to precisely quantify structural similarity between expressions. With this benchmark, we explore mainstream LLM fine-tuning techniques for SR tasks and establish SymbolicChat, a simple yet effective LLM-based SR strong baseline. Experimental results validate SymbolicChat as the first LLM to exceed traditional numerical methods in both numerical precision and symbolic form accuracy, outperforming the second-best LLM baseline with improvements of 2-fold gains in R^2 score and 8.37% in form-level consistency score.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 3423

Loading