Abstract: Symbolic Regression (SR) is the task of finding closed-form analytical expressions that describe the relationship between variables in a dataset. In this work, werethink SR and introduce mechanisms from two perspectives: morphology and adaptability. Morphology: Man-made heuristics are typically utilized in SR algorithms to influence the morphology (or structure) of candidate expressions, potentially introducing unintentional bias and data leakage. To address this issue, we create a depth-aware mathematical language model trained on terminal walks of expression trees, as a replacement to these heuristics. Adaptability: We promote alternating fitness functions across generations, eliminating equations that perform well in only one fitness function and as a result, discover expressions that are closer to the true functional form. We demonstrate this by alternating fitness functions that quantify faithfulness to values (via MSE) and empirical derivatives (via a novel theoretically justified fitness metric coined MSEDI). Proof-of-concept: We combine these ideas into a minimalistic evolutionary SR algorithm that outperforms a suite of benchmark and state of-the-art SR algorithms in problems with unknown constants added, which we claim are more reflective of SR performance for real-world applications. Our claim is then strengthened by reproducing the superior performance on real-world regression datasets from SRBench. This Hot-of-the-Press paper summarizes the work K.S. Fong, S. Wongso and M. Motani, "Rethinking Symbolic Regression: Morphology and Adaptability in the Context of Evolutionary Algorithms", The Eleventh International Conference on Learning International Conference on Learning Representations (ICLR'23).
Loading