Closed-Box Adversarial Attack Method for Object Detection Under Multiview Conditions

Yun Zhang, Zhenhua Yu, Zheng Yin, Ou Ye, Xuya Cong, Houbing Herbert Song

Published: 15 Jul 2025, Last Modified: 04 Nov 2025IEEE Internet of Things JournalEveryoneRevisionsCC BY-SA 4.0
Abstract: Deep learning-based object detection has become an important application in industrial IoT. However, studies have shown that adversarial attacks may cause object detection to output incorrect detection results. Such vulnerabilities can threaten the robustness of object detection systems and lead to security problems. To address the issue of low attack effectiveness on target detection from different perspectives using the existing adversarial attack methods, this article proposes an adversarial attack method with multiview adaptive weight-balancing. First, a multiview channel is constructed for training, and the target features under different viewpoints are comprehensively considered to enhance the robustness of the attack method. Then, the model is optimized by combining the model shake drop and patch cut-out algorithms during the training process, so that the attack method no longer relies on a single model, thus enhancing its generalization ability. Finally, by dynamically adjusting the weights of each viewpoint, a weight-balancing strategy is constructed, which adaptively adjusts the preference of different perspectives during the training process to enhance the attack effect of the attack method in each viewpoint. To verify the performance of the method, experiments are conducted on multiple benchmarks, specifically the PKU-Reid dataset. Compared with the mainstream methods, the proposed method improves the attack success rate by 3.78% and 19.26% under glass-box and closed-box conditions, respectively, while reducing the mean average precision of the object detection model by 2.18% and 11.12%, respectively. The experimental results demonstrate that the proposed method effectively enhances attack performance on targets from different viewpoints and exhibits better viewpoint robustness.
Loading