Head Pursuit: Probing Attention Specialization in Multimodal Transformers

Published: 30 Sept 2025, Last Modified: 30 Sept 2025Mech Interp Workshop (NeurIPS 2025) PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Probing, Understanding high-level properties of models
Other Keywords: Attention heads, Matching pursuit, Vision-language models
TL;DR: Attention heads in text-generative models specialize in semantic and visual concepts. We revisit signal processing tools to characterize such specialization and identify heads related to target concepts.
Abstract: Language and vision-language models have shown impressive performance across a wide range of tasks, but their internal mechanisms remain only partly understood. In this work, we study how individual attention heads in text-generative models specialize in certain semantic or visual attributes. We reinterpret the established practice of probing intermediate activations with the final decoding layer through the lens of signal processing. This lets us analyze multiple samples in a principled way and rank attention heads based on their relevance to target concepts. Our results show consistent patterns of specialization at the head level across both unimodal and multimodal transformers. Remarkably, we find that editing as few as 1\% of the heads, selected using our method, can reliably impact targeted concepts in the model output.
Submission Number: 120
Loading