Red Teaming Visual Language ModelsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: VLMs (Vision-Language Models) extend the capabilities of LLMs (Large Language Models) to accept multimodal inputs. Since it has been verified that LLMs can be induced to generate harmful or inaccurate content through specific test cases (termed as Red Teaming), how VLMs perform in similar scenarios, especially with their combination of textual and visual inputs, remains a question. To explore this problem, we present a novel red teaming dataset RTVLM, which encompasses 12 subtasks (e.g., image misleading, multi-modal jailbreaking, face fairness, etc) under 4 primary aspects (faithfulness, privacy, safety, fairness). Our RTVLM is the first red teaming dataset to benchmark current VLMs in terms of these 4 different aspects. Detailed analysis shows that 10 prominent open-sourced VLMs struggle with the red teaming in different degrees and have up to 31% performance gap with GPT-4V. Additionally, we simply apply red teaming alignment to LLaVA-v1.5 with Supervised Fine-tuning (SFT) using RTVLM, and this bolsters the models' performance with 10% in RTVLM test set, 13% in MM-hallu, and without noticeable decline in MM-Bench, overpassing other LLaVA-based models in similar size with regular alignment data. This reveals that current open-sourced VLMs still lack red teaming alignment. Our code and datasets will be open-sourced.
Paper Type: long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: Data resources
Languages Studied: English
Preprint Status: There is a non-anonymous preprint (URL specified in the next question).
A1: yes
A1 Elaboration For Yes Or No: Section 6 Limitations
A2: yes
A2 Elaboration For Yes Or No: Section 7 Ethic consideration
A3: yes
A3 Elaboration For Yes Or No: Abstract and Introduction part
B: yes
B1: yes
B1 Elaboration For Yes Or No: Section 2 RTVLM Dataset
B2: yes
B2 Elaboration For Yes Or No: Section 2 RTVLM Dataset
B3: yes
B3 Elaboration For Yes Or No: Section 2 RTVLM Dataset
B4: yes
B4 Elaboration For Yes Or No: Section 2 RTVLM Dataset
B5: yes
B5 Elaboration For Yes Or No: Section 2 RTVLM Dataset
B6: yes
B6 Elaboration For Yes Or No: Section 2 RTVLM Dataset
C: no
C1: yes
C1 Elaboration For Yes Or No: Section 3 Experimental Results
C2: yes
C2 Elaboration For Yes Or No: 3.1 Experimental Settings
C3: yes
C3 Elaboration For Yes Or No: 3.2 Red Teaming Test Results
C4: yes
C4 Elaboration For Yes Or No: 3.2 Red Teaming Test Results
D: yes
D1: yes
D1 Elaboration For Yes Or No: Appendix D
D2: yes
D2 Elaboration For Yes Or No: Ethical consideration
D3: yes
D3 Elaboration For Yes Or No: Ethical consideration
D4: yes
D4 Elaboration For Yes Or No: Ethical consideration
D5: yes
D5 Elaboration For Yes Or No: Ethical consideration
E: yes
E1: yes
E1 Elaboration For Yes Or No: Ethical consideration
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview