Auditing Traffic-Sign Robustness via DDIM Inversion: Do Diffusion Latents Preserve Shadow Attacks?

Published: 27 Mar 2026, Last Modified: 11 Apr 20266thAdvMLEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adversarial robustness; traffic sign recognition; diffusion inversion; DDIM; shadow attacks; latent-space detection; autonomous driving safety; physical-world adversarial examples; Stable Diffusion; out-of-distribution generalization
TL;DR: DDIM inversion through domain-adapted Stable Diffusion preserves classifier behavior on traffic signs while exposing separable latent signatures of adversarial vs. benign shadows, enabling lightweight detection without pixel-space heuristics.
Abstract: Traffic-sign classifiers must remain reliable under physically plausible perturbations such as cast shadows, which can be optimized into stealthy adversarial attacks while appearing visually indistinguishable from benign environmental effects. Pixel-space diagnostics often fail to separate adversarial shadows from incidental ones without degrading scene content. We propose deterministic DDIM inversion through a domain-adapted Stable Diffusion v1.5 model as an analysis interface for this problem. After fine-tuning on an in-distribution GTSRB subset under four shadow-aware curricula, DDIM reconstructions are visually faithful and behavior-preserving for non-adversarial inputs (accuracy deltas $<$3points; confidence shifts $<$0.01), while adversarial effects persist after reconstruction. A lightweight MLP trained on flattened inversion latents achieves ROC-AUC $\approx 0.96$ on the in-distribution split with meaningful transfer under class shift (GTSRB Yield) and dataset shift (LISA). These results position diffusion inversion as a practical auditing interface for shadow attacks, providing latent representations where adversarialness is measurable without pixel-space heuristics.
Submission Number: 15
Loading