Vision-Language Asymmetry in Bistable Image Captioning

Published: 04 Jun 2026, Last Modified: 06 Jun 2026PhilML@ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: mechanistic interpretability, sparse autoencoders, vision-language models, bistable images, causal steering, aspect-seeing, philosophy of perception, Wittgenstein, CLIP, LLaVA
TL;DR: Vision-language models represent both aspects of bistable images at the vision encoder but commit to one at the language decoder, operationalizing Wittgenstein's seeing/seeing-as distinction.
Abstract: Wittgenstein’s duck–rabbit poses a question for vision-language models: when a model captions an ambiguous image, where in the model is the commitment to one aspect made? We address this with a 3,320-generation behavioral base- line over 83 bistable stimuli that surfaces three regimes (default-dominant, force-dominant, force- balanced) under neutral vs forced-choice prompting, then probe the underlying representations using a TopK sparse autoencoder we train on the CLIP layer that LLaVA-1.6-7B actually consumes (validation EV 0.93). Across 69 bistable stimuli with both per-aspect feature pools available, 72% (50/69) show simultaneous activation of both pools at the vision tower, including 12/12 default- dominant duck/rabbit and 7/8 force-balanced young/old. Causal steering at CLIP layer 22 flips captions on default-dominant stimuli (33% rabbit- flip rate under a fluency guard) but cannot flip cap- tions on force-balanced young/old at any tested coefficient, despite their vision-side superposition. The dominance bottleneck lives downstream of the vision tower; the gap between vision-side representation and language-side commitment is an empirical handle on the seeing/seeing-as distinction. We also flag a methodological note: rank- based statistics on TopK SAE outputs require tie- corrected ranking to avoid silent row-order bias.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 63
Loading