Efficient Quantification of Multimodal Interaction at Sample Level

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We efficiently quantify sample-level multimodal interaction and leverage these interactions for understanding and improving multimodal learning.
Abstract: Interactions between modalities—redundancy, uniqueness, and synergy—collectively determine the composition of multimodal information. Understanding these interactions is crucial for analyzing information dynamics in multimodal systems, yet their accurate sample-level quantification presents significant theoretical and computational challenges. To address this, we introduce the Lightweight Sample-wise Multimodal Interaction (LSMI) estimator, rigorously grounded in pointwise information theory. We first develop a redundancy estimation framework, employing an appropriate pointwise information measure to quantify this most decomposable and measurable interaction. Building upon this, we propose a general interaction estimation method that employs efficient entropy estimation, specifically tailored for sample-wise estimation in continuous distributions. Extensive experiments on synthetic and real-world datasets validate LSMI's precision and efficiency. Crucially, our sample-wise approach reveals fine-grained sample- and category-level dynamics within multimodal data, enabling practical applications such as redundancy-informed sample partitioning, targeted knowledge distillation, and interaction-aware model ensembling. The code is available at https://github.com/GeWu-Lab/LSMI_Estimator.
Lay Summary: We quantify how information is generated through multimodal interactions, efficiently distinguishing whether it originates from shared sources across modalities, is specific to a single modality, or emerges synergistically from their combined effect. This quantification offers practical insights for real-world datasets.
Link To Code: https://github.com/GeWu-Lab/LSMI_Estimator
Primary Area: General Machine Learning->Everything Else
Keywords: Multimodal interactions, Information measurement, Information decomposition
Submission Number: 2900
Loading