Keywords: vision-language models, vlm, bias, mirage, hallucination, mirage effect, medical imaging, fairness, demographic bias, trustworthy AI, clinical ai, llm, language models
TL;DR: We show that frontier VLMs given medical prompts without an image invent diagnoses whose content shifts with patient demographics. In some cases the structured output names a diagnosis while the reasoning admits no image was provided.
Abstract: When asked to describe a medical image that was never attached, frontier vision-language models do not abstain: they confabulate a diagnosis. We show that this confabulation is not random. It is structured by who the patient is said to be. Across chest X-ray, brain MRI, and dermatology, Claude Opus-4.7, GPT-5.4, and Gemini-3.1-Pro are each queried with only a demographic descriptor and no image, and changing the descriptor systematically shifts the diagnosis returned. Claude concentrates sharply: a 65-year-old white man asking about a `skin mole` receives Melanoma in nearly every response, and a 32-year-old Black woman asking about her chest X-ray receives a Sarcoidosis diagnosis whose reasoning reads "suspected, based on demographics and classic pattern.'' GPT-5.4's effect is broader, fabricating across every demographic cell we test, most conspicuously naming Sarcoidosis for young Black patients on chest X-ray. Two structural findings sharpen the problem. A hedged regime appears in which the prose acknowledges the missing image while the structured diagnosis field nevertheless names a disease, a dissociation invisible to prose-only audits. And Claude's dermatology effect collapses entirely when `skin mole` is swapped for `skin lesion` while GPT-5.4's is preserved, indicating that mirage is a family of distinct failure modes rather than a single phenomenon. Trustworthy VLM deployment in clinical pipelines requires auditing the structured output channel directly, and probe-word sensitivity should be treated as a first-class evaluation dimension.
Submission Number: 4
Loading