Fairness Failure Modes of Multimodal LLMs

Canyu Chen; Anglin Cai; Joan Nwatu; Jianshu Zhang; Yale Li; Han Liu; Jessica Hullman; Rada Mihalcea; Kathleen McKeown; Manling Li

Fairness Failure Modes of Multimodal LLMs

Canyu Chen, Anglin Cai, Joan Nwatu, Jianshu Zhang, Yale Li, Han Liu, Jessica Hullman, Rada Mihalcea, Kathleen McKeown, Manling Li

Published: 02 Mar 2026, Last Modified: 14 Apr 2026AFAA 2026 OralEveryoneRevisionsBibTeXCC BY 4.0

Track: Main Papers Track (6 to 9 pages)

Keywords: Multimodal LLM, Fairness, Benchmarking

TL;DR: We construct a new dataset MultiBBQ for controllable fairness evaluation of MLLMs and diagnose four critical Fairness Failure Modes through comprehensive benchmarking and systematic analysis.

Abstract: Although Multimodal Large Language Models (MLLMs) are increasingly deployed in high-stakes domains, the fairness of their outputs is under-explored. Building on the BBQ language bias benchmark, we construct a new dataset MultiBBQ using attested social biases and AI-generated photorealistic images for controllable fairness evaluation of MLLMs in both visual-only and visual-language contexts. We propose two metrics Fairness Score and Bias Score and design an evaluation paradigm to address shortcut reasoning and data contamination challenges. Using comprehensive benchmarking, we diagnose four new Fairness Failure Modes of MLLMs. In particular, we discover that proprietary models may fail to conduct effective counter-bias reasoning in disambiguated contexts due to over-refusal, while open-source models are deficient in abstaining in ambiguous contexts. We also analyze how different input and model factors degrade fairness, demonstrate that MLLMs amplify bias over their backbone LLMs, and show the potential limited effectiveness of mitigation methods such as reasoning and fairness instruction. We release our code and dataset to facilitate further evaluations and the development of mitigation methods here.

Submission Number: 62

Loading