Abstract: Computer Vision and Natural Language Processing have made significant advancements, leading to the emergence of multimodal models that seamlessly integrate diverse input modalities. While these models hold great potential for various applications, their complex nature with multiple inputs also makes them more susceptible to attacks. Unfortunately, minimal attention has been devoted to backdoor attacks targeting multimodal models. In this work, we propose a stealthy backdoor attack method called “Invisible Multi-trigger Multimodal Backdoor” for multimodal deep learning models. Inspired by steganography, our method leverages the features of multiple input sources to clandestinely embed multi-triggers that are undetectable to humans, enabling a covert backdoor attack on multimodal models. This study represents a pioneering exploration of steganography application within the context of multimodal models. To the best of our knowledge, no prior research has employed steganography techniques specifically for multimodal models. Experimental results demonstrate the effectiveness and scalability of our approach, as it successfully executes the attack without compromising the normal model’s performance. With a 1% poisoning rate, our attack achieves a 98.9% attack success rate, while maintaining high model accuracy. To foster research on defense mechanisms against multi-modal backdoors, we have made available a COCO toxic dataset (TrojCOCO). This dataset has been meticulously curated to specifically address the study of vulnerabilities associated with these types of attacks.
0 Replies
Loading