SHAP-PGD: A Realistic Adversarial Attack on Tabular Data by Unifying Interpretability and Semantic Consistency
Keywords: adversarial attack, tabular data, interpretability, semantic consistency
Abstract: Adversarial attacks on tabular data have unique challenges due to inter-feature constraints and semantic realism requirements, compelling attackers to introduce minimal perturbations to as few features as possible to generate realistic adversarial samples. However, existing methods overly relax or even overlook these constraints and become trapped in local optima. Furthermore, current research often neglects to interpret how adversarial samples perturb model decisions since the highly abstract nature of tabular data, with their semantic consistency evaluation remaining ambiguous and poorly defined. To address these challenges, we propose SHAP-PGD, an interpretable white-box tabular adversarial attack under complex constraints. SHAP-PGD utilizes global attribution to identify the most influential features and uses a decoupled gradient masking mechanism within this selected constraint-satisfying subspace to avoid local minima. This design enables the generation of realistic perturbations and enhances interpretability throughout the attack process. In addition, to investigate semantic consistency, we draw on both synthetic distribution and model utility, providing a concrete and scalable formulation of the consistency problem. Extensive experiments on four datasets and five victim architectures show that SHAP-PGD maintains semantic consistency while generating realistic perturbations and outperforms existing methods in 35 out of 40 settings with an average robustness reduction of 59.4\%.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 3386
Loading