Keywords: Interpretable Machine Learning, moderating effect, science
TL;DR: We propose a SHAP interaction–based framework to automatically discover moderators in data, enabling interpretable, systematic detection of conditional effects.
Abstract: Machine Learning (ML) is increasingly applied across the sciences, accelerating simulations, automating data preparation, and improving predictive accuracy. Yet most efforts emphasize efficiency and performance, with limited attention to interpretability, thereby leaving unexplored how ML can drive discovery—uncovering novel patterns in data and advancing scientific theory. Moderation effects—where the influence of one variable depends on the level of another—are central to disciplines such as social science and human behavior. However, they are typically studied through a theory-driven process based on regression models with manually specified interactions. While insightful, this approach is limited because it scales poorly and may miss unexpected moderators.
We introduce an automated, interpretable framework for moderator discovery based on SHAP interaction values. Our method computes global interaction contributions from a predictive model, quantifies their dependence on constituent features, and identifies statistically significant moderators. In experiments on real-world datasets, the framework not only recovers known, theory-consistent moderating effects but also uncovers novel moderator candidates. These results illustrate how explainable ML can move beyond prediction toward systematic discovery, offering scientists a scalable tool to reveal conditional relationships that inform theory development.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 14979
Loading