Embodied Laser Attack:Leveraging Scene Priors to Achieve Agent-based Robust Non-contact Attacks

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: As physical adversarial attacks become extensively applied in unearthing the potential risk of security-critical scenarios, especially in dynamic scenarios, their vulnerability to environmental variations has also been brought to light. The non-robust nature of physical adversarial attack methods brings less-than-stable performance consequently. Although methods such as Expectation over Transformation (EOT) have enhanced the robustness of traditional contact attacks like adversarial patches, they fall short in practicality and concealment within dynamic environments such as traffic scenarios. Meanwhile, non-contact laser attacks, while offering enhanced adaptability, face constraints due to a limited optimization space for their attributes, rendering EOT less effective. This limitation underscores the necessity for developing a new strategy to augment the robustness of such practices. To address these issues, this paper introduces the Embodied Laser Attack (ELA), a novel framework that leverages the embodied intelligence paradigm of Perception-Decision-Control to dynamically tailor non-contact laser attacks. For the perception module, given the challenge of simulating the victim's view by full-image transformation, ELA has innovatively developed a local perspective transformation network, based on the intrinsic prior knowledge of traffic scenes and enables effective and efficient estimation. For the decision and control module, ELA trains an attack agent with data-driven reinforcement learning instead of adopting time-consuming heuristic algorithms, making it capable of instantaneously determining a valid attack strategy with the perceived information by well-designed rewards, which is then conducted by a controllable laser emitter. Experimentally, we apply our framework to diverse traffic scenarios both in the digital and physical world, verifying the effectiveness of our method under dynamic successive scenes.
Primary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This paper is oriented to the object recognition task in videos, aiming to improve non-contact attacks that are harder to enhance in dynamic scenarios. Specifically, we propose the Embodied Laser Attack with the paradigm of embodied intelligence, bolstering the robustness of physical non-contact attacks through active scene understanding and self-adaptation for such successive scenarios. A key contribution is that we leverage inherent geometric priors of dynamic scenes to achieve active scene understanding based on dual-view video data, focusing on environmental variations such as object’s location and rotation that mainly exist in real-life like traffic scenarios. Specifically, we propose PTN for perception module, accurately simulating the target’s distortion states from third-party accessible data based on specific spatio-temporal relationship. Unlike relying on multi-view imagery for full-scene new viewpoint synthesis, our lightweight implementation achieves real-time key region simulation, which meets the need for instant inference under dynamic conditions. Besides, ELA first proposes an attack agent trained via reinforcement learning and implement with a flexible laser medium, which enables the instantaneous determination. Overall, our paper combines multimedia technology with the field of security, providing a new solution for dealing with continuous attacks in dynamic scenarios, which owns more research significance to real-life DNNs’ deployment.
Supplementary Material: zip
Submission Number: 1414
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview