Keywords: Rule Learning, Gradient Boosting, Branch-and-bound
TL;DR: The interpretability-accuracy trade-off of gradient boosted rule ensembles is improved by a novel objective function.
Abstract: Gradient boosting of decision rules is an efficient approach to find interpretable yet accurate machine learning models. However, in practice, interpretability requires to limit the number and size of the generated rules, and existing boosting variants are not designed for this purpose. Through their strict greedy approach, they can increase accuracy only by adding further rules, even when the same gains can be achieved, in a more interpretable form, by altering already discovered rules. Here we address this shortcoming by adopting a weight correction step in each boosting round to maximise the predictive gain per added rule. This leads to a new objective function for rule selection that, based on orthogonal projections, anticipates the subsequent weight correction. This approach does not only correctly approximate the ideal update of adding the risk gradient itself to the model, it also favours the inclusion of more general and thus shorter rules. Additionally, we derive a fast incremental algorithm for rule evaluation, as necessary to enable efficient single-rule optimisation through either the greedy or the branch-and-bound approach. As we demonstrate on a range of classification, regression, and Poisson regression tasks, the resulting rule learner significantly improves the comprehensibility/accuracy trade-off of the fitted ensemble. At the same time, it has comparable computational cost to previous branch-and-bound rule learners.
Supplementary Material: zip
Submission Number: 10498
Loading