Color Blindness Test Images as Seen by Large Vision-Language Models

ICLR 2026 Conference Submission18582 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large vision-language models
Abstract: Large vision-language models (LVLMs) are fairly powerful in understanding this colorful world, yet their reasoning is grounded in highly entangled semantics, leaving open the question of whether they genuinely perceive colors in human-like manners. Although they could correctly answer questions related to colors, they might internally rely on specific prior knowledge and correlations between color and other semantics instead of directly process the color semantic. To this end, we study how LVLMs perceive color blindness test images (CBTIs), and we conclude that CBTIs as seen by LVLMs are different from CBTIs as seen by humans. Specifically, in this paper, we create IshiharaColorBench following the Ishihara test, where LVLMs have to directly process colors, and the digit in any test image could be recognized if and only if LVLMs genuinely perceive colors. We perform two types of tests: standard color blindness tests for performance assessment and controlled color sensitivity tests for behavior analysis. Given the former tests, LVLMs perform close to random guessing, and neither scaling-up nor fine-tuning leads to generalizable improvement; given the latter tests, we find several systematic biases, such as the red preference and the sensitivity to saturation contrast but not brightness contrast. Our findings reveal notable limitations of existing LVLMs in genuine color perception, thereby highlighting the need for developing novel model architectures or training strategies toward a smarter and more human-aligned perceptual foundation of LVLMs.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 18582
Loading