Structure Guided Equation Discovery with Influence-Based Feedback for Large Language Models

ICLR 2026 Conference Submission19428 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs for science, equation discovery
TL;DR: We propose SGED, an approach using LLMs for equation discovery with granular influence function feedback
Abstract: Large Language Models (LLMs) hold significant promise for scientific discovery, particularly in identifying interpretable, closed-form equations from complex data. However, existing LLM-driven approaches often rely on coarse, scalar feedback (e.g., overall Mean Squared Error), limiting the LLM's ability to discern the individual contributions of components within a proposed equation. This forces the LLM to rely heavily on its priors or engage in inefficient trial-and-error exploration. We introduce *Structure Guided Equation Discovery* (SGED), a novel framework where LLMs act as dual agents in an iterative symbolic modeling pipeline. An LLM agent first proposes candidate basis functions $\psi_j(x)$ for a linear symbolic model $f(x) = \sum_j w_j \psi_j(x)$. A second LLM agent then refines this set of terms, critically guided by detailed, per-term \textit{influence scores} $\Delta_j$ and fitted weights $w_j$. These scores quantify each basis function's contribution to predictive accuracy, providing the crucial granular feedback needed for effective model refinement. SGED can operate as a direct iterative refinement loop or be integrated into Monte Carlo Tree Search (MCTS) for a more comprehensive exploration of the equation space. We demonstrate that providing LLMs with this structured, influence-based feedback improves the accuracy of discovered equations and the efficiency of the discovery process on diverse biological and synthetic datasets. SGED highlights the broader principle that equipping LLMs with detailed, interpretable feedback about sub-components of their generative output can unlock more sophisticated reasoning and self-improvement capabilities.
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 19428
Loading