Influence Attributions can be Systematically Altered by Model Manipulation

Published: 03 Feb 2026, Last Modified: 02 May 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Influence-based Attributions can be Systematically Altered by Model Manipulation
Abstract: Influence Functions are a standard tool for attributing predictions to training data in a principled manner and are widely used in applications such as data valuation and fairness. In this work, we present realistic incentives to manipulate influence-based attributions and investigate whether these attributions can be \textit{systematically} altered by an adversary. We show that small systemic perturbations to models can indeed alter influence-based attributions \textit{as desired}. We work on logistic regression models trained on ResNet feature embeddings and standard tabular fairness datasets and provide efficient attacks with backward-friendly implementations. Our work raises questions on the reliability of influence-based attributions in adversarial circumstances.
Code Dataset Promise: Yes
Code Dataset Url: https://github.com/infinite-pursuits/influence-based-attributions-can-be-manipulated
Signed Copyright Form: pdf
Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.
Submission Number: 619
Loading