Keywords: Symbolic Regression, symbolic equivalence, Monte Carlo Tree Search, Deep Reinforcement Learning, Large Language Model
TL;DR: We introduce EGG-SR, a unified framework that integrates equality graphs (e-graphs) into diverse symbolic regression algorithms.
Abstract: Symbolic regression seeks to uncover physical laws from experimental data by searching for closed-form expressions, which is an important task in AI-driven scientific discovery. Yet the exponential growth of the search space of expression renders the task computationally challenging.
A promising yet underexplored direction for reducing the effective search space and accelerating training lies in *symbolic equivalence*: many expressions, although syntactically different, define the same function—for example, $\log(x_1^2x_2^3)$, $\log(x_1^2)+\log(x_2^3)$, and $2\log(x_1)+3\log(x_2)$.
Existing algorithms treat such variants as distinct outputs, leading to redundant exploration and slow learning.
We introduce EGG-SR, a unified framework that integrates equality graphs (e-graphs) into diverse symbolic regression algorithms, including Monte Carlo Tree Search (MCTS), deep reinforcement learning (DRL), and large language models (LLMs).
EGG-SR compactly represents equivalent expressions through the proposed EGG module, enabling more efficient learning by:
(1) pruning redundant subtree exploration in EGG-MCTS,
(2) aggregating rewards across equivalence classes in EGG-DRL, and
(3) enriching feedback prompts in EGG-LLM.
Theoretically, we establish that embedding e-graphs tightens the regret bound of MCTS and reduces the variance of the DRL gradient estimator.
Empirically, EGG-SR consistently enhances a class of modern symbolic regression algorithms across multiple benchmarks, discovering equations with lower normalized mean squared error.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 10593
Loading