LaVCa: LLM-assisted Visual Cortex Captioning

LaVCa: LLM-assisted Visual Cortex Captioning

ICLR 2026 Conference Submission19639 Authors

19 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Neuroscience, Computer vision, Visual systems, Captioning, Large language model, Semantics, Neuroimaging, Functional magnetic resonance imaging

TL;DR: We propose LaVCa, a novel method that generates data-driven captions for individual voxels.

Abstract: Understanding the properties of neural populations (or voxels) in the human brain can advance our comprehension of human perceptual and cognitive processing capabilities and contribute to developing brain-inspired computer models. Recent encoding models using deep neural networks (DNNs) have successfully predicted voxel-wise activity. However, interpreting the properties that explain voxel responses remains challenging because of the black-box nature of DNNs. As a solution, we propose LLM-assisted Visual Cortex Captioning (LaVCa), a data-driven approach that leverages large language models (LLMs) to generate natural-language captions for images to which voxels are selective. By applying LaVCa for image-evoked brain activity, we demonstrate that LaVCa generates captions that describe voxel selectivity more accurately than the previous approaches. The captions generated by LaVCa quantitatively capture more detailed properties than the existing method at both the inter-voxel and intra-voxel levels. Furthermore, we find richer representational content within cortical regions that prior neuroimaging studies have deemed selective for simpler categories. These findings offer profound insights into human visual representations by assigning detailed captions throughout the visual cortex while highlighting the potential of LLM-based methods in understanding brain representations.

Supplementary Material: zip

Primary Area: applications to neuroscience & cognitive science

Submission Number: 19639

Loading