Contextual Contamination and Cognitive Inertia in Multimodal AI

Contextual Contamination and Cognitive Inertia in Multimodal AI

02 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Cognitive inertia, Contextual contamination, Multimodal AI errors, Common-sense failure, Human oversight essential

TL;DR: AI in medicine often makes unexpected errors due to cognitive inertia and contextual contamination—strong prior contexts distort unrelated visual facts. Human oversight remains critical.

Abstract: Background: While state-of-the-art artificial intelligence (AI) models achieve human expert-level performance in specific medical imaging tasks, they exhibit fundamental limitations when encountering out-of-distribution (OOD) inputs, committing seemingly basic errors that defy common-sense reasoning. This study extends beyond the known issue of AI’s anatomical common-sense deficits by exploring a “hierarchical error” mechanism wherein one dominant context distorts the interpretation of adjacent, logically unrelated information. Methods: We designed a multi-stage qualitative experimental framework. First, we established baseline performance by presenting several cutting-edge multimodal AI systems with standard radiological images and normal anatomical illustrations. Next, we observed common-sense failures by testing the models on anatomically impossible (“nonsensical”) images. Finally, we assessed AI responses to an abstract image of six converging lines both in isolation and when paired with an abnormal six-fingered hand emoji. Results: The AI systems performed with high accuracy on standard medical images and normal illustrations, but they completely failed to recognize the inherent impossibility of the nonsensical images. Most significantly, when the abstract line image was presented alone, the AI correctly identified six lines; however, when the identical image was shown alongside the abnormal hand emoji, the AI incorrectly reported five lines. This demonstrates how a powerful semantic context (in this case, “hand = five”) can hierarchically dominate and contaminate the processing of visual information in an otherwise unrelated image. Conclusions: AI’s most fundamental errors extend beyond simple pattern-recognition failures. They stem from structural limitations characterized by “cognitive inertia” (entrenchment in dominant prior assumptions) and “contextual contamination” (distortion of surrounding information by those assumptions). This represents a critical limitation that cannot be resolved with prompt engineering or other user-level fixes alone. Human critical thinking and oversight therefore remain essential to complement AI’s autonomous diagnostic capabilities.

Submission Number: 73

Loading