TL;DR: We find well performing sparse trees, dramatically improving scalability while maintaining SOTA accuracy.
Abstract: Decision tree optimization is fundamental to interpretable machine learning. The most popular approach is to greedily search for the best feature at every decision point, which is fast but provably suboptimal. Recent approaches find the global optimum using branch and bound with dynamic programming, showing substantial improvements in accuracy and sparsity at great cost to scalability. An ideal solution would have the accuracy of an optimal method and the scalability of a greedy method. We introduce a family of algorithms called SPLIT (SParse Lookahead for Interpretable Trees) that moves us significantly forward in achieving this ideal balance. We demonstrate that not all sub-problems need to be solved to optimality to find high quality trees; greediness suffices near the leaves. Since each depth adds an exponential number of possible trees, this change makes our algorithms orders of magnitude faster than existing optimal methods, with negligible loss in performance. We extend this algorithm to allow scalable computation of sets of near-optimal trees (i.e., the Rashomon set).
Lay Summary: Decision trees ask simple questions about data to make a prediction. They can be easily interpreted as a flowchart. However, it is difficult to find well performing decision trees for a given task.
Most popular algorithms are "greedy": they focus on the next best question to ask at every step without considering whether it leads to the best overall outcome. They are fast, but the resulting flowcharts can be much bigger than needed, and their predictions may not be as accurate. We might instead find the mathematically "optimal" flowchart that is also simple. But this requires us to go through all possible simple flowcharts to prove we've found the best one. It's like choosing the best question only after thinking through all paths it could lead to in the future. It yields accurate flowcharts, but it can be quite slow.
We bridge the gap between "greedy" and "optimal" algorithms by building flowcharts (aka decision trees) by asking questions based on information we acquire a few steps into the future, rather than thinking through all possibilities. The algorithm is almost as fast as greedy approaches, but yields comparable accuracy to optimal approaches.
Link To Code: https://github.com/VarunBabbar/SPLIT-ICML/tree/main
Primary Area: Social Aspects->Accountability, Transparency, and Interpretability
Keywords: Decision Tree Optimization, Interpretable Machine Learning, Discrete Optimization
Submission Number: 7628
Loading