Keywords: Physical adversarial attack; vehicle detectors; 3D texture camouflage; environment appearance transfer; affine transform; StyleGAN prior; on-manifold optimization; printability; multi-view robustness.
Abstract: We study full-coverage, printable 3D camouflage attacks on vehicle detectors. Our pipeline decouples photorealism from attackability by combining a closed-form \emph{Intrinsic Appearance Transfer} (IAT) module with an on-manifold StyleGAN texture prior under Expectation-over-Transformations (EOT) focused on camera and environment. IAT carries exposure/white balance/tone and veiling from a reference frame to the render via per-pixel affine carriers and is training-free at test time; adversarial textures are optimized only through early StyleGAN layers to preserve material plausibility. On a scene-controlled CARLA corpus spanning 22 weather/time presets, 8 azimuths, 9 elevations, 6 distances, and 3 locations, our method---optimized white-box on YOLOv3 and evaluated black-box on Faster R-CNN, RetinaNet, RTMDet, and DINO---reduces AP@0.5 from \mbox{0.75} to \mbox{0.11} on YOLOv3 ($-85.8\%$), with corresponding drops to \mbox{0.13} ($-82.5\%$) on Faster R-CNN, \mbox{0.22} ($-68.7\%$) on RetinaNet, \mbox{0.26} ($-67.1\%$) on RTMDet, and \mbox{0.59} ($-31.7\%$) on DINO. Averaged over detectors, AP@0.5 decreases from \mbox{0.7538} to \mbox{0.2863} ($\approx 62\%$). Ablations show that (i) sRGB-domain affine fits excel on unseen \emph{colors}, while linear-RGB fits excel on unseen \emph{textures}; and (ii) cross-color U-Net training with a content loss yields the best perceptual fidelity among learned baselines. Overall, a simple, differentiable IAT combined with a layer-restricted generative prior offers a practical path to robust, photorealistic 3D camouflage that transfers across models and conditions.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 22706
Loading