Symbolic Regression is NP-hard

Published: 25 Oct 2022, Last Modified: 28 Feb 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Symbolic regression (SR) is the task of learning a model of data in the form of a mathematical expression. By their nature, SR models have the potential to be accurate and human-interpretable at the same time. Unfortunately, finding such models, i.e., performing SR, appears to be a computationally intensive task. Historically, SR has been tackled with heuristics such as greedy or genetic algorithms and, while some works have hinted at the possible hardness of SR, no proof has yet been given that SR is, in fact, NP-hard. This begs the question: Is there an exact polynomial-time algorithm to compute SR models? We provide evidence suggesting that the answer is probably negative by showing that SR is NP-hard.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url:
Changes Since Last Submission: Minor textual improvements for camera ready.
Assigned Action Editor: ~Swarat_Chaudhuri1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 280