Super Deep Contrastive Information Bottleneck for Multi-modal Clustering

Zhengzheng Lou; Ke Zhang; Yucong Wu; Shizhe Hu

Super Deep Contrastive Information Bottleneck for Multi-modal Clustering

Zhengzheng Lou, Ke Zhang, Yucong Wu, Shizhe Hu

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This paper proposes a method called super deep contrastive information bottleneck for MMC (SDCIB), which aims to explore and utilize all types of latent information to the fullest extent.

Abstract: In an era of increasingly diverse information sources, multi-modal clustering (MMC) has become a key technology for processing multi-modal data. It can apply and integrate the feature information and potential relationships of different modalities. Although there is a wealth of research on MMC, due to the complexity of datasets, a major challenge remains in how to deeply explore the complex latent information and interdependencies between modalities. To address this issue, this paper proposes a method called super deep contrastive information bottleneck (SDCIB) for MMC, which aims to explore and utilize all types of latent information to the fullest extent. Specifically, the proposed SDCIB explicitly introduces the rich information contained in the encoder's hidden layers into the loss function for the first time, thoroughly mining both modal features and the hidden relationships between modalities. Moreover, the proposed SDCIB performs dual optimization by simultaneously considering consistency information from both the feature distribution and clustering assignment perspectives, the proposed SDCIB significantly improves clustering accuracy and robustness. We conducted experiments on 4 multi-modal datasets and the accuracy of the method on the ESP dataset improved by 9.3\%. The results demonstrate the superiority and clever design of the proposed SDCIB. The source code is available on https://github.com/ShizheHu.

Lay Summary: Computers can analyze many types of data, such as images, text, and speech, clustering them into different groups. We want to further explore whether the intermediate information generated by the computer in the process of analyzing this data can in turn help it better clustering the data. We designed a "command center" (technically called a loss function) to "guide" the computer in analyzing the data. During analysis, the intermediate information generated is sent back to the "command center" to "guide" the computer in better analyzing the data and achieving improved data partition. We found the use of intermediate information in the "command center" leads to better performance in guiding the computer to analyze and segment the data. Our results show that the intermediate information generated during data analysis contains rich representations, enabling better data partition, which in turn facilitates the discovery of the intrinsic structure and patterns within the data.

Primary Area: General Machine Learning->Clustering

Keywords: Multi-modal clustering, Information bottleneck, contrastive learning

Submission Number: 8300

Loading