Trading Complexity for Sparsity in Random Forest Explanations

Gilles AUDEMARD; Steve Bellart; Louenas Bounia; Frederic M Koriche; Jean-Marie Lagniez; Pierre Marquis

Trading Complexity for Sparsity in Random Forest Explanations

Gilles AUDEMARD, Steve Bellart, Louenas Bounia, Frederic M Koriche, Jean-Marie Lagniez, Pierre Marquis

21 May 2021 (modified: 05 May 2023)NeurIPS 2021 SubmittedReaders: Everyone

Keywords: Interpretability, Random Forests, Combinatorial Optimization

TL;DR: We examine several types of explanations for random forest predictions, which together offer a trade-off between runtime complexity and sparsity.

Abstract: Random forests have long been considered as powerful model ensembles in statistical machine learning. By training multiple decision trees, whose diversity is fostered through bagging and subspace sampling, the resulting random forest can lead to more stable and reliable predictions than a single decision tree. This however comes at the cost of decreased interpretability: although decision trees are often easily interpretable, the predictions made by random forests are much more difficult to understand, as they involve a majority vote among hundreds of decision trees. In this paper, we examine different types of reasons that explain ``why'' an input instance is classified as positive or negative by a Boolean random forest. Notably, as an approximation of sufficient reasons (that take the form of prime implicants of the random forest), we introduce majority reasons which are prime implicants of a strict majority of decision trees. For these different abductive explanations, the tractability of the generation problem (finding one reason) and the minimization problem (finding one shortest reason) are investigated. Experiments conducted on various datasets reveal the existence of a trade-off between runtime complexity and sparsity. In a nutshell, sufficient reasons - for which the identification problem has been proved recently as DP-complete - are slightly shorter than majority reasons that can be generated using a simple polynomial-time greedy algorithm; minimal majority reasons - for which the identification problem is shown NP-complete - are significantly shorter than sufficient reasons and they can be computed using a partial MaxSAT algorithm that turns out to be quite efficient in practice.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: zip

9 Replies

Loading