Keywords: symbolic regression for scientific discovery, physics, datasets
TL;DR: We propose new datasets to discuss the performance of symbolic regression for scientific discovery (SRSD).
Abstract: Symbolic Regression (SR) is a task of recovering mathematical expressions from given data and has been attracting attention from the research community to discuss its potential for scientific discovery. However, the community lacks datasets of symbolic regression for scientific discovery (SRSD) to discuss the potential of SR. To address the critical issue, we revisit datasets of SRSD to discuss the potential of symbolic regression for scientific discovery. Focused on a set of formulas used in the existing datasets based on Feynman Lectures on Physics, we recreate 120 datasets to discuss the performance of SRSD. For each of the 120 SRSD datasets, we carefully review the properties of the formula and its variables to design reasonably realistic sampling ranges of values so that our new SRSD datasets can be used for evaluating the potential of SRSD such as whether or not an SR method can (re)discover physical laws from such datasets. We conduct experiments on our new SRSD datasets using five state-of-the-art SR methods in SRBench, and the results show that the new SRSD datasets are more challenging than the original ones. Our datasets and code repository are publicly available.