Interpretable Mixture of Experts

Published: 26 May 2023, Last Modified: 26 May 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: The need for reliable model explanations is prominent for many machine learning applications, particularly for tabular and time-series data as their use cases often involve high-stakes decision making. Towards this goal, we introduce a novel interpretable modeling framework, Interpretable Mixture of Experts (IME), that yields high accuracy, comparable to `black-box' Deep Neural Networks (DNNs) in many cases, along with useful interpretability capabilities. IME consists of an assignment module and a mixture of experts, with each sample being assigned to a single expert for prediction. We introduce multiple options for IME based on the assignment and experts being interpretable. When the experts are chosen to be interpretable such as linear models, IME yields an inherently-interpretable architecture where the explanations produced by IME are the exact descriptions of how the prediction is computed. In addition to constituting a standalone inherently-interpretable architecture, IME has the premise of being integrated with existing DNNs to offer interpretability to a subset of samples while maintaining the accuracy of the DNNs. Through extensive experiments on 15 tabular and time-series datasets, IME is demonstrated to be more accurate than single interpretable models and perform comparably with existing state-of-the-art DNNs in accuracy. On most datasets, IME even outperforms DNNs, while providing faithful explanations. Lastly, IME's explanations are compared to commonly-used post-hoc explanations methods through a user study -- participants are able to better predict the model behavior when given IME explanations, while finding IME's explanations more faithful and trustworthy.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We have made the changes requested by the reviewers we highlighted the major changes by coloring the text in red. Changes include: - Adding DNN+ past experiments to the appendix. - Added additional ablation studies to the appendix - Adding classification equation to the appendix. - Moving the sample interpretability experiments from the appendix to the main text. - Minor text clarifications.
Assigned Action Editor: ~Frederic_Sala1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 923