Enhancing Prediction, Explainability, Inference and Robustness of Decision Trees via Symbolic Regression-Discovered Splits

Kei Sen Fong; Mehul Motani

Enhancing Prediction, Explainability, Inference and Robustness of Decision Trees via Symbolic Regression-Discovered Splits

Kei Sen Fong, Mehul Motani

Published: 01 Jan 2024, Last Modified: 28 Sept 2024GECCO Companion 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We introduce a hybrid machine learning algorithm that utilizes Genetic Programming-based Symbolic Regression (SR) to create decision trees (DT) with enhanced prediction, explainability, inference and robustness. Conventional DT algorithms for classification tasks are limited to axis-parallel splits. Thus, when the true boundaries do not align with feature axes, DT is likely to exhibit complex structures. In this work, we introduce SR-Enhanced DT (SREDT), which utilizes SR to increase the richness of the class of potential DT splits. We assess the performance of SREDT on both synthetic and real-world datasets. Despite its simplicity, our approach yields remarkably compact trees that surpass DT and its variant, oblique DT (ODT), in supervised classification tasks in terms of accuracy and F-score. SREDT possesses low depth, with a small number of leaves and terms, increasing explainability. SREDT also makes faster inference, even compared to DT. SREDT also demonstrates the highest robustness to noise. Furthermore, despite being a small white-box model, SREDT demonstrates competitive performance with large black-box tabular classification algorithms, including tree ensembles and deep models. This Hot-of-the-Press paper summarizes the work, K.S. Fong and M. Motani, "Symbolic Regression Enhanced Decision Trees for Classification Tasks", The Annual AAAI Conference on Artificial Intelligence (AAAI'24).

Loading