Deciphering Cross-Modal Feature Interactions in Multimodal AIGC Models: A Mechanistic Interpretability Approach

yiyong xiao

Deciphering Cross-Modal Feature Interactions in Multimodal AIGC Models: A Mechanistic Interpretability Approach

yiyong xiao

18 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mechanistic Interpretability, Multimodal AI, Sparse Autoencoders, Causal Intervention, AIGC Models

Abstract: The rapid advancement of multimodal AI-generated content (AIGC) models has created an urgent need for understanding their internal mechanisms, particularly how these systems integrate and process information across different modalities. This paper presents a novel mechanistic interpretability framework that combines sparse autoencoders (SAEs) with causal intervention techniques to dissect cross-modal feature interactions in state-of-the-art multimodal AIGC models. We introduce the Cross-Modal Mechanistic Analysis (CMMA) methodology, which systematically identifies and manipulates interpretable features responsible for multimodal content generation. Through comprehensive experiments on Vision-Language Models (VLMs) including CLIP, LLaVA, and DALL-E variants using 2.5M carefully curated multimodal samples, our approach reveals three distinct phases of cross-modal information processing: feature extraction, modal alignment, and concept synthesis. We demonstrate that targeted interventions on discovered features can significantly improve generation quality while reducing hallucinations by $34.2\% \pm 2.1\%$ ($p < 0.001$) and enhancing semantic consistency by $28.7\% \pm 1.8\%$ ($p < 0.001$). Our findings provide crucial insights into the mechanistic foundations of multimodal AIGC systems and establish a roadmap for developing more interpretable and controllable generative models.

Primary Area: interpretability and explainable AI

Submission Number: 11957

Loading