Influence Attributions can be Systematically Altered by Model Manipulation

Chhavi Yadav; Ruihan Wu; Kamalika Chaudhuri

Influence Attributions can be Systematically Altered by Model Manipulation

Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri

Published: 03 Feb 2026, Last Modified: 02 May 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Influence-based Attributions can be Systematically Altered by Model Manipulation

Abstract: Influence Functions are a standard tool for attributing predictions to training data in a principled manner and are widely used in applications such as data valuation and fairness. In this work, we present realistic incentives to manipulate influence-based attributions and investigate whether these attributions can be \textit{systematically} altered by an adversary. We show that small systemic perturbations to models can indeed alter influence-based attributions \textit{as desired}. We work on logistic regression models trained on ResNet feature embeddings and standard tabular fairness datasets and provide efficient attacks with backward-friendly implementations. Our work raises questions on the reliability of influence-based attributions in adversarial circumstances.

Code Dataset Promise: Yes

Code Dataset Url: https://github.com/infinite-pursuits/influence-based-attributions-can-be-manipulated

Signed Copyright Form: pdf

Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.

Submission Number: 619

Loading