Seeking Interpretability and Explainability in Binary Activated Neural Networks

Published: 01 Jan 2024, Last Modified: 10 Jan 2025xAI (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We study the use of binary activated neural networks as interpretable and explainable predictors in the context of regression tasks on tabular data. We first dissect those specific networks to understand their inner workings better and bound their best achievable training performances. We then use this analysis as a theoretical foundation to propose a greedy algorithm for building interpretable binary activated networks. The simplicity of the predictor being instrumental for achieving interpretability, our approach builds the predictors one neuron at a time, so that their architecture (complexity) suits the task at hand. Finally, we present an approach based on the efficient computation of SHAP values for quantifying the relative importance of the features, hidden neurons and individual connections (weights) of these particular networks. Our work sets forth a new family of predictors to consider when interpretability is of importance.
Loading