Interpretable Generalized Additive Models for Datasets with Missing Values

Hayden McTavish; Jon Donnelly; Margo Seltzer; Cynthia Rudin

Interpretable Generalized Additive Models for Datasets with Missing Values

Hayden McTavish, Jon Donnelly, Margo Seltzer, Cynthia Rudin

Published: 25 Sept 2024, Last Modified: 14 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Interpretability, Missing Data, Generalized Additive Models, Sparsity

TL;DR: We introduce an interpretable GAM approach for missing data which improves accuracy under synthetic missingness while globally improving sparsity, all with no significant cost to real-world accuracy or runtime.

Abstract: Many important datasets contain samples that are missing one or more feature values. Maintaining the interpretability of machine learning models in the presence of such missing data is challenging. Singly or multiply imputing missing values complicates the model’s mapping from features to labels. On the other hand, reasoning on indicator variables that represent missingness introduces a potentially large number of additional terms, sacrificing sparsity. We solve these problems with M-GAM, a sparse, generalized, additive modeling approach that incorporates missingness indicators and their interaction terms while maintaining sparsity through $\ell_0$ regularization. We show that M-GAM provides similar or superior accuracy to prior methods while significantly improving sparsity relative to either imputation or naïve inclusion of indicator variables.

Primary Area: Interpretability and explainability

Submission Number: 11945

Loading