Graph-based Symbolic Regression with Invariance and Constraint Encoding

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0
Keywords: Symbolic Regression, Expression Graph, Monte-Carlo Tree Search, Expression Equivalency, Permutation Invariance, Constrained Search
TL;DR: Graph-based symbolic regression that captures expression equivalences on graph representations and incorporate constrained search utilizing hybrid neural-guided Monte-Carlo tree search.
Abstract: Symbolic regression (SR) seeks interpretable analytical expressions that uncover the governing relationships within data, providing mechanistic insight beyond 'black-box' models. However, existing SR methods often suffer from two key limitations: (1) *redundant representations* that fail to capture mathematical equivalences and higher-order operand relations, breaking permutation invariance and hindering efficient learning; and (2) *sparse rewards* caused by incomplete incorporation of constraints that can only be evaluated on full expressions, such as constant fitting or physical-law verification. To address these challenges, we propose a unified framework, **Graph-based Symbolic Regression (GSR)**, which compresses the search space through a permutation-invariant representation, Expression Graphs (EGs), that intrinsically encode expression equivalences via a term-rewriting system (TRS) and a directed acyclic graph (DAG) structure; and mitigates reward sparsity via employing hybrid neural-guided Monte-Carlo tree search (hnMCTS) on EGs, where the constraint-informed neural guidance enables direct incorporation of expression-level constraint priors, and an adaptive $\epsilon$-UCB policy balances exploration and exploitation. Theoretical analyses establish the uniqueness of our proposed EG representation and the convergence of the hnMCTS algorithm. Experiments on synthetic and real-world scientific datasets demonstrate the efficiency and accuracy of GSR in discovering underlying expressions and adhering to physical laws, offering practical solutions for scientific discovery.
Primary Area: Machine learning for sciences (e.g. climate, health, life sciences, physics, social sciences)
Submission Number: 19027
Loading