Enhancing the Explainability of Gradient Boosting for Regression Problems through Comparable Samples Selection

Enhancing the Explainability of Gradient Boosting for Regression Problems through Comparable Samples Selection

TMLR Paper2497 Authors

09 Apr 2024 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Gradient-boosted decision Trees (GBDT) is a highly effective learning method widely used for addressing classification and regression problems. However, akin to other ensemble methods, GBDT suffers from a lack of explainability. Explainability is a desirable property: the ability to discover relationships between input data attributes and the ultimate model predictions is crucial for a comprehensive understanding of the GBDT method. To enhance the explainability of such algorithms, we propose to exhibit particular training data, referred to as \textit{comparable samples}, upon which the model heavily relies for specific predictions. To that end, we show that a prediction of GBDT can be decomposed as a weighted sum of training data when using specific loss functions. It is noteworthy that these weights may be negative. Furthermore, during the prediction of a training sample's response, the weights associated with other training samples in the prediction's decomposition vanish, indicating a potential issue of overfitting. To overcome this issue, we introduce nonnegativity constraints on the weights and substitute gradient descent with a methodology inspired by the Frank-Wolfe algorithm called Explainable Gradient Boosting (ExpGB). The predictions generated by the proposed algorithm can be directly interpreted as convex combinations of the training targets. This allows for selecting training data resembling a given sample by comparing their decomposition coefficients. We conduct a comparative analysis with classical GBDT algorithms across diverse datasets to validate the estimation quality. Additionally, we evaluate the fidelity of comparable samples by demonstrating their proficiency in estimating the characteristics of the considered sample. Our approach, thus, offers a promising avenue for enhancing the explainability of GBDT and similar ensemble methods.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: We clarified the fact that the proposed method can be used with a wide variety of loss functions, not only with $\ell_2$ norm. We removed the beginning of section 3 to make it less redundant with the end of the previous section. We now explicitly say that the maximization of eq (12) does not require the knowledge of the probability density. We also made some improvements to the notations.

Assigned Action Editor: ~Satoshi_Hara1

Submission Number: 2497

Loading