Parsing the Language of Expressions: Enhancing Symbolic Regression with Domain-Aware Symbolic Priors

Sikai Huang; Tara Adusumilli; Kusum Choudhary; Yixin Wen; Haizhao Yang

Parsing the Language of Expressions: Enhancing Symbolic Regression with Domain-Aware Symbolic Priors

Sikai Huang, Tara Adusumilli, Kusum Choudhary, Yixin Wen, Haizhao Yang

27 Sept 2024 (modified: 16 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Symbolic regression, Reinforcement learning, Recurrent neural network, Domain knowledge prior

Abstract: Symbolic regression is pivotal for discovering interpretable expressions that unravel complex phenomena by revealing underlying mathematical and physical relationships within data. In this paper, we introduce an enhanced symbolic regression method that integrates symbol priors derived from diverse scientific domains—including physics, biology, chemistry, and engineering—into the regression process. By organizing and analyzing domain-specific expressions, we examine the probability distributions of symbols across different topics. We introduce a novel tree-structured recurrent neural networks (RNNs) infused with these symbol priors to guide the learning process using domain knowledge. In our approach, we introduce a new tree structure to represent expressions, where unary operators connected by the same binary operator are positioned at the same hierarchical level. By analyzing the combinations of symbols at different heights and levels within the tree, we are able to examine symbol priors across the entire hierarchical structure. This effectively incorporates the structural information of expressions into the regression process. Additionally, we compile characteristic expression blocks from each domain and incorporate them into the operator dictionary during training, expediting learning by providing relevant building blocks. Experimental results demonstrate that incorporating symbol priors significantly boosts the performance of symbolic regression methods. Specifically, it accelerates the efficiency of reinforcement learning algorithms in obtaining optimal policies. Our findings confirm that leveraging domain-specific symbol priors not only hastens convergence but also yields more accurate and interpretable models, effectively bridging the gap between data-driven learning and expert expertise in symbolic regression.

Supplementary Material: pdf

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12428

Loading