Depth-Aware Adversarial Training for Robust Image Classification

20 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: adversarial defense, adversarial robustness, depth estimation
Abstract: Adversarial examples exploit non-robust, imperceptible features to fool deep neural networks. To explain and address this problem, we propose Depth-Aware Adversarial Training (DAAT), which regularizes model attention to be consistent with scene geometry inferred from monocular depth. Concretely, DAAT leverages a pretrained (frozen) depth estimator to compute depth-gradient maps and imposes an alignment penalty that encourages a Vision Transformer to focus on depth-consistent cues while adversarial examples are generated during training, steering learning away from brittle texture signals toward geometry-aligned evidence. Empirically, on ImageNet-100, On ImageNet-100, DAAT improves $L_\infty$ AutoAttack robust accuracy by 6.96\% over standard adversarial training while retaining strong clean performance (80.74\%). Theoretically, we further justify DAAT with two analyses: (i) a geometric account showing that small perturbations can distort inferred depth and shift decisions, whereas depth-aligned attention preserves 3D structure in the representation; and (ii) a robust-optimization view in which the alignment term tightens an upper bound on adversarial loss by constraining gradients along depth-inconsistent directions. These results indicate that integrating depth cues into training is a principled route to more robust and interpretable image classifiers, bridging adversarial robustness and 3D vision.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 23200
Loading