Boosting Recovery in Transformer-Based Symbolic Regression

Henrik Voigt; Paul Kahlmeyer; Kai Lawonn; Michael Habeck; Joachim Giesen

Boosting Recovery in Transformer-Based Symbolic Regression

Henrik Voigt, Paul Kahlmeyer, Kai Lawonn, Michael Habeck, Joachim Giesen

26 Sept 2024 (modified: 18 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: symbolic regression, interpretability, transformer, recovery

TL;DR: We show that the recovery performance of the end-to-end symbolic regression approach can be improved by carefully selecting the training data.

Abstract: The traditional objective in regression is generalization. That is, learning a function from training data that performs well beyond the training data. Symbolic regression adds another objective, namely, interpretability of the regressor. In the context of regression, interpretability means that the representation of the regressor facilitates insights into mechanisms that underlie the functional dependence. State-of-the-art symbolic regressors provide such insights. However, the state of the art predominantly incurs high costs at inference time. The recently proposed transformer-based end-to-end approach is orders of magnitude faster at inference time. It does, however, not achieve state-of-the-art performance in terms of interpretability, which is typically measured by the ability to recover ground truth formulas from samples. Here, we show that the recovery performance of the end-to-end approach can be boosted by carefully selecting the training data. We construct a synthetic dataset from first principles and demonstrate that the capacity to recover ground truth formulas is proportional to the available computational resources.

Supplementary Material: pdf

Primary Area: interpretability and explainable AI

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8012

Loading