TL;DR: This paper has introduced a plug-and-play module MP-Nav designed to evaluate (and even enhance) data poisoning attacks against multimodal models.
Abstract: Despite the success of current multimodal learning at scale, its susceptibility to data poisoning attacks poses security concerns in critical applications. Attacker can manipulate model behavior by injecting maliciously crafted yet minute instances into the training set, stealthily mismatching distinct concepts. Recent studies have manifested the vulnerability by poisoning multimodal tasks such as Text-Image Retrieval (TIR) and Visual Question Answering (VQA). However, the current attacking method only rely on random choice of concepts for misassociation and random instance selections for injecting the poisoning noise, which often achieves the suboptimal effect and even risks failure due to the dilution of poisons by the large number of benign instances. This study introduces MP-Nav (Multimodal Poison Navigator), a plug-and-play module designed to evaluate and even enhance data poisoning attacks against multimodal models. MP-Nav operates at both the concept and instance levels, identifying semantically similar concept pairs and selecting robust instances to maximize the attack efficacy. The experiments corroborate MP-Nav can significantly improve the efficacy of state-of-the-art data poisoning attacks such as AtoB and ShadowCast in multimodal tasks, and maintain model utility across diverse datasets. Notably, this study underscores the vulnerabilities of multimodal models and calls for the counterpart defenses.
Lay Summary: Multimodal artificial intelligence (AI) systems, which combine information from images and text, are becoming increasingly popular and powerful. These systems help with tasks like finding images that match a description or answering questions about pictures. However, as they grow in use, especially in sensitive areas like healthcare or autonomous vehicles, it's important to understand and prevent possible vulnerabilities. One such vulnerability is called a data poisoning attack. In this type of attack, a bad actor sneaks a few carefully chosen and slightly altered examples into the training data used to teach the AI. These hidden changes can cause the model to learn incorrect associations—such as confusing a medicine type A for type B—without noticeably affecting its general performance. This makes the attack hard to detect but potentially dangerous. This paper introduces a new tool called MP-Nav that comprehensively uncover this vulnerability of data poisons in Multimodal AI systems.
Primary Area: Social Aspects->Safety
Keywords: data poisoning attack, multimodal AI models
Submission Number: 3145
Loading