Feature Importance Metrics in the Presence of Missing Data

Henrik von Kleist; Joshua Wendland; Ilya Shpitser; Carsten Marr

Feature Importance Metrics in the Presence of Missing Data

Henrik von Kleist, Joshua Wendland, Ilya Shpitser, Carsten Marr

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Addressing feature importance in the presence of missing data by contrasting full and observed data perspectives and introducing a novel metric to guide feature acquisition decisions.

Abstract: Feature importance metrics are critical for interpreting machine learning models and understanding the relevance of individual features. However, real-world data often exhibit missingness, thereby complicating how feature importance should be evaluated. We introduce the distinction between two evaluation frameworks under missing data: (1) feature importance under the full data, as if every feature had been fully measured, and (2) feature importance under the observed data, where missingness is governed by the current measurement policy. While the full data perspective offers insights into the data generating process, it often relies on unrealistic assumptions and cannot guide decisions when missingness persists at model deployment. Since neither framework directly informs improvements in data collection, we additionally introduce the feature measurement importance gradient (FMIG), a novel, model-agnostic metric that identifies features that should be measured more frequently to enhance predictive performance. Using synthetic data, we illustrate key differences between these metrics and the risks of conflating them.

Lay Summary: To understand how machine learning models make predictions, researchers often ask: which pieces of information matter most? This is known as feature importance. For example, in a clinical diagnosis model, we might want to know which symptoms or tests — like blood pressure or lab results — most influence the diagnosis. But in many real-world settings, not all information is available. Tests can be expensive or invasive, so they aren’t performed for every patient. This raises a key question: how should we think about feature importance when some data is missing? We address this by distinguishing between three goals. One asks what feature importance would look like if no data were missing — offering a clean view of each variable’s effect. Another works with the data we actually observe, accepting that missingness will persist in practice. The third asks how we should improve data collection — for example, which tests a hospital should run more often. We introduce a new metric for this third question and provide mathematical tools to compute all three. Using simple simulated datasets, we show how the answers can differ — highlighting the importance of matching the definition of feature importance to the real-world decision at hand.

Primary Area: General Machine Learning

Keywords: feature importance, missing data, leave-one-covariate-out, feature acquisition, full data, observed data

Submission Number: 10161

Loading