Learning Optimal Multimodal Information Bottleneck Representations

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Leveraging high-quality joint representations from multimodal data can greatly enhance model performance in various machine-learning based applications. Recent multimodal learning methods, based on the multimodal information bottleneck (MIB) principle, aim to generate optimal MIB with maximal task-relevant information and minimal superfluous information via regularization. However, these methods often set regularization weights in an *ad hoc* manner and overlook imbalanced task-relevant information across modalities, limiting their ability to achieve optimal MIB. To address this gap, we propose a novel multimodal learning framework, Optimal Multimodal Information Bottleneck (OMIB), whose optimization objective guarantees the achievability of optimal MIB by setting the regularization weight within a theoretically derived bound. OMIB further addresses imbalanced task-relevant information by dynamically adjusting regularization weights per modality, ensuring the inclusion of all task-relevant information. Moreover, we establish a solid information-theoretical foundation for OMIB's optimization and implement it under the variational approximation framework for computational efficiency. Finally, we empirically validate the OMIB’s theoretical properties on synthetic data and demonstrate its superiority over the state-of-the-art benchmark methods in various downstream tasks.
Lay Summary: Machines can learn better when they use information from multiple sources, like combining what they hear and see. However, blending this information well is tricky, especially when one source is more useful than the others. Our research tackles this problem by developing a new method that helps computers find the right balance: keeping the useful information while filtering out what’s unnecessary. Unlike previous approaches, our method sets this balance using a mathematically sound rule rather than trial-and-error. It can even adjust how much each source matters, depending on how helpful it is. We tested our approach on both simulated and real-world data, and it consistently outperformed other leading methods. This could make future AI systems smarter and more adaptable, especially in complex situations that require understanding information from different perspectives.
Primary Area: General Machine Learning->Representation Learning
Keywords: Multimodal Learning; Information Bottleneck
Submission Number: 8650
Loading