Keywords: Face Recognition, Adversarial Training, Semantic Attack, Latent Space, StyleGAN, Diffusion Models, Robustness, Defense, Adversarial Machine Learning, Attack, Computer Vision, Vision Transformer, CNN, Deep Learning, Machine Learning, Privacy, Security
TL;DR: We propose StyleAT, an adversarial defense using a novel bounded semantic attack in the latent space, achieving robust accuracy against state-of-the-art face recognition attacks and common defenses.
Abstract: With face-recognition models now embedded in everyday authentication and surveillance, recent works have pinpointed a critical weakness: these models remain acutely vulnerable to adversarial semantic edits. I.e., adversarially produced semantic alterations to the input, such as slight aging or pose changes, can induce misclassifications. Certain existing attacks are powerful, but they can be computationally costly, rendering them inadequate for developing defenses (e.g., through adversarial training). To fill the gap, we introduce BoundStyle, a potent semantic attack operating in StyleGAN’s rich latent space to maximize misclassification rates. Notably, BoundStyle is significantly more efficient than equally powerful attacks, making it suitable for adversarial training. Building on BoundStyle, we develop StyleAT, an efficient adversarial training scheme that incorporates low-budget attack variants yet defends against stronger and unseen semantic attacks. We evaluate on two datasets unseen during training (LFW and VGG-Face) and five models, and find that StyleAT boosts robust accuracy against state-of-the-art attacks (DiffPrivate and BoundStyle) and outperforms common defenses (DOA and classical filters) in various settings.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 19774
Loading