The Unseen Hand: Manipulating Model Fairness and SHAP with Targeted Identity Re-Association Attacks

Sannaan Khan; Muhammad U. S. Khan

The Unseen Hand: Manipulating Model Fairness and SHAP with Targeted Identity Re-Association Attacks

Sannaan Khan, Muhammad U. S. Khan

Published: 24 Nov 2025, Last Modified: 24 Nov 20255th Muslims in ML Workshop co-located with NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Algorithmic Fairness, Adversarial Robustness, Explainable AI, SHAP, AI Fairness

Abstract: As machine learning models grow more influential and opaque, algorithmic fairness and explainability are critical for ensuring accountability. However, we demonstrate that these auditing mechanisms are themselves vulnerable to subtle manipulation, camouflaging the influence of protected features. While prior work on data-agnostic attacks has exposed this vulnerability, they leave behind detectable artifacts that compromise their stealth. We introduce Targeted Identity Re-Association (TIRA) attacks, a novel family of attacks that iteratively and probabilistically manipulate a model’s outputs without requiring access to the model's internals or feature representations. We formalize two algorithms: Probabilistic Micro-Shuffling (PMiS), which applies localized adjacent swaps, and Probabilistic Rank-Shift Micro-Perturbation (PRSMP), which introduces small, randomized rank shifts. We empirically demonstrate that TIRA attacks are highly effective at pushing fairness metrics towards ideal values. Crucially, TIRA attacks successfully confound SHAP-based explanations, leaving effectively zero residual attribution for protected features, a major improvement over prior work.

Track: Track 2: ML by Muslim Authors

Submission Number: 9

Loading