Interpretability-by-Design with Accurate Locally Additive Models and Conditional Feature Effects

Published: 02 Jun 2026, Last Modified: 21 Jun 2026Greeks in AI 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: explainability; explainable-by-design; interpretability
Domains: Machine Learning Theory, Other
TL;DR: We close the accuracy-interpretability gap: our method model feature interactions without sacrificing the univariate plots and global auditing that make GAMs trustworthy.
External Link: https://arxiv.org/abs/2602.16503
Abstract: Generalized additive models (GAMs) offer interpretability through independent univariate feature effects but underfit when interactions are present in data. GA$^2$Ms add selected pairwise interactions which improves accuracy, but sacrifices interpretability and limits model auditing. We propose \emph{Conditionally Additive Local Models} (CALMs), a new model class, that balances the interpretability of GAMs with the accuracy of GA$^2$Ms. CALMs allow multiple univariate shape functions per feature, each active in different regions of the input space. These regions are defined independently for each feature as simple logical conditions (thresholds) on the features it interacts with. As a result, effects remain locally additive while varying across subregions to capture interactions. We further propose a principled distillation-based training pipeline that identifies homogeneous regions with limited interactions and fits interpretable shape functions via region-aware backfitting. Experiments on diverse classification and regression tasks show that CALMs consistently outperform GAMs and achieve accuracy broadly comparable to GA$^2$Ms, while preserving the univariate auditability that GA$^2$Ms forfeit. Overall, CALMs offer a favorable trade-off between predictive accuracy and interpretability.
Submission Number: 86
Loading