How Robust is Google's Bard to Adversarial Image Attacks?

Published: 01 Nov 2023, Last Modified: 12 Dec 2023R0-FoMo PosterEveryoneRevisionsBibTeX
Keywords: Adversarial robustness, multimodal large language model, black-box attack
TL;DR: We analyze the robustness of Google' Bard as a representative MLLM to adversarial image attacks.
Abstract: Multimodal Large Language Models (MLLMs) that integrate text and other modalities (especially vision) have achieved unprecedented performance in various multimodal tasks. However, due to the unsolved adversarial robustness problem of vision models, MLLMs can have more severe safety and security risks by introducing the vision inputs. In this work, we study the adversarial robustness of commercial MLLMs, and especially Google's Bard, a representative chatbot with multimodal capability. By attacking white-box surrogate vision encoders or MLLMs, the generated adversarial examples can mislead Bard to output wrong image descriptions with a 22\% success rate based solely on the transferability. We demonstrate that the adversarial examples can also attack other MLLMs, e.g., a 45\% attack success rate against GPT-4V, a 26\% attack success rate against Bing Chat, and a 86\% attack success rate against ERNIE bot. Moreover, we identify two defense mechanisms of Bard, including face detection and toxicity detection of images. We design corresponding attacks to evade these defenses, demonstrating that the current defenses of Bard are also vulnerable. We hope this work can deepen our understanding on the robustness of MLLMs and facilitate future research on defenses. Our code is available at https://github.com/thu-ml/Attack-Bard.
Submission Number: 14
Loading