Towards Understanding Multimodal Fine-Tuning: A Case Study into Spatial Features

Published: 22 Sept 2025, Last Modified: 03 Jan 2026WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision–Language Models, Multimodal Training, Mechanistic Interpretability, Stage-wise Model Diffing, Sparse Autoencoders, Spatial Reasoning, Attention Heads
Submission Number: 397
Loading