Analyzing Representational Shifts in Multimodal Models: A Study of Feature Dynamics in Gemma and PaliGemma

Published: 24 Sept 2025, Last Modified: 24 Sept 2025INTERPLAYEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Models, Sparse Autoencoders, Vision-Language Models, Representational Shifts, Feature Dynamics, Gemma, PaliGemma, Interpretability, Activation Patterns, Cross-Modal Analysis
Abstract: Understanding internal representational shifts that occur from the adaptation of large language models (LLMs) to vision-language models (VLMs) provides insight into trade-offs in model interpretability, feature reuse, and task specialization. This paper presents an empirical study on representational shifts that occur when extending the LLM Gemma2-2B into its multimodal successor, PaliGemma2-3B. Our initial performance analysis reveals that sparse autoencoders (SAEs) trained on Gemma struggle to reconstruct PaliGemma’s activations, motivating a deeper investigation into its activation patterns. Across 26 layers, 37% of SAE features show reduced activation in PaliGemma relative to Gemma. Further experiments on CIFAR-100 and TruthfulQA reveal that PaliGemma relies heavily on visual inputs, activating substantially fewer features for text alone. Additional analyses—including Residual Stream SAE Performance Analysis, Activation Frequency and Dead Feature Quantification, Cross-Modal Feature Activity Patterns, and Semantic Robustness under Label Perturbations—provide consistent evidence that PaliGemma’s internal representations are more visually grounded and less aligned with purely textual features. Our findings suggest key representational trade-offs in feature dynamics when transitioning from unimodal to multimodal models.
Public: Yes
Track: Main-Long
Submission Number: 28
Loading