Keywords: interpretability, dirichlet_process, bayesian_optimization
Abstract: A desirable property of interpretable models is small size, so that they are easily understandable by humans. This leads to the following challenges: (a) small sizes typically lead to diminished accuracy, and, (b) different techniques offer bespoke levers, e.g., L1 regularization, for making this size-accuracy trade-off that might be insufficient to reach the desired balance.
We address these challenges here. Earlier work has shown that learning the training distribution creates accurate small models. Our contribution is a new technique that exploits this idea. The training distribution is modeled as a Dirichlet Process for flexibility in representation. Its parameters are learned using Bayesian Optimization; a design choice that makes the technique applicable to non-differentiable loss functions. To avoid challenges with high data dimensionality, the data is first projected down to one-dimension using uncertainty scores of a separate probabilistic model, that we refer to as the uncertainty oracle.
Based on exhaustive experiments we show that this technique possesses multiple merits: (1) it significantly enhances small model accuracies, (2) is versatile: it may be applied to different model families with varying notions of size, e.g., depth of a decision tree, non-zero coefficients in a linear model, simultaneously the maximum depth of a tree and number of trees in Gradient Boosted Models, (3) is practically convenient because it needs only one hyperparameter to be set and works with non-differentiable losses, (4) works across different feature spaces between the uncertainty oracle and the interpretable model, e.g., a Gated Recurrent Unit trained using character sequences may be used as an oracle for a Decision Tree that uses character n-grams, and, (5) may augment the accuracies of fairly old techniques to be competitive with recent task-specialized techniques, e.g., CART Decision Tree (1984) vs Iterative Mistake Minimization (2020), on the task of cluster explanation.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 22131
Loading