LFRD: Enhancing Adversarial Transferability via Low-Rank Features Guidance and Representation Dispersion Regularization

20 Sept 2025 (modified: 02 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: adversarial attack, adversarial transferability, black-box attacks
TL;DR: We introduce LFRD, a transferable adversarial attack framework that combines SVD-extracted low-rank features guidance and representation dispersion regularization based on HHI to improve adversarial transferability.
Abstract: Transfer-based adversarial attacks have become a mainstream approach for fooling modern deep neural networks. Numerous methods have aimed to enhance adversarial transferability by perturbing intermediate-layer features. However, existing methods overfit surrogate-specific features and generate imbalanced feature activations to unseen models. To address these issues, we propose LFRD, a transferable adversarial attack framework that combines low-rank features extraction and representation dispersion regularization. Specifically, Singular Value Decomposition (SVD) is employed to isolate low-rank components that capture dominant and invariant semantic features shared across models, providing model-free guidance and mitigating surrogate-specific overfitting. In parallel, a regularization term based on the Herfindahl–Hirschman Index (HHI) is introduced to balance feature activations by penalizing overly dominant responses and amplifying weaker ones. By jointly aligning perturbations with low-rank semantic structures and promoting dispersed feature utilization, LFRD yields adversarial examples with improved representation-level generalization. Experimental results on both standard and robust models show that our method demonstrates stronger adversarial transferability than state-of-the-art methods.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 24210
Loading