Statistical Inference for Model Responsiveness Audits

Published: 23 Sept 2025, Last Modified: 18 Nov 2025ACA-NeurIPS2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Safety, Recourse, Interventions, Auditing
TL;DR: We demonstrate how to use statistical inference to audit the responsiveness of a machine learning model.
Abstract: Many safety failures in machine learning arise when models assign predictions to people – e.g., in lending, hiring, or content moderation – without accounting for how individuals can change their inputs. We introduce a formal validation procedure for the responsiveness of predictions with respect to interventions on their features. Our procedure frames responsiveness as a type of sensitivity analysis in which practitioners control a set of changes by specifying constraints over interventions and distributions over downstream effects. We describe how to estimate responsiveness for the predictions of any model and any dataset using only black-box access, and design algorithms that use these estimates to support tasks such as falsification and failure probability analysis. The resulting audits uncover the problem at hand and enable community or regulatory oversight: when lack (or excess) of responsiveness is negligible, off-the-shelf models suffice; when material, findings motivate redesign (e.g., strategic classification) or policy changes. We demonstrate these safety benefits and illustrate how collective stakeholders can help steer AI systems.
Submission Number: 28
Loading