Testing Most Influential Sets

Published: 26 Jan 2026, Last Modified: 26 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: attribution, robustness auditing, causal inference, fairness, least squares, extreme value
TL;DR: Theoretical foundations & procedures to test most infuential sets for excessive influence
Abstract: Small influential data subsets can dramatically impact model conclusions, with a few data points overturning key findings. While recent work identifies these most influential sets, there is no formal way to tell when maximum influence is excessive rather than expected under natural random sampling variation. We address this gap by developing a principled framework for most influential sets. Focusing on linear least-squares, we derive a convenient exact influence formula and identify the extreme value distributions of maximal influence – the heavy-tailed Fréchet for constant-size sets and heavy-tailed data, and the well-behaved Gumbel for growing sets or light tails. This allows us to conduct rigorous hypothesis tests for excessive influence. We demonstrate through applications across economics, biology, and machine learning benchmarks, resolving contested findings and replacing ad-hoc heuristics with rigorous inference.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 10176
Loading