Multi-aspect Self-guided Deep Information Bottleneck for Multi-modal Clustering

Shizhe Hu, Jiahao Fan, Guoliang Zou, Yangdong Ye

Published: 2025, Last Modified: 14 May 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep multi-modal clustering can extract useful information among modals, thus benefiting the final clustering and many related fields. However, existing multi-modal clustering methods have two major limitations. First, they often ignore different levels of guiding information from both the feature representations and cluster assignments, which thus are difficult in learning discriminative representations. Second, most methods fail to effectively eliminate redundant information between multi-modal data, negatively affecting clustering results. In this paper, we propose a novel multi-aspect self-guided deep information bottleneck (MSDIB) method for multi-modal clustering, which can effectively employ different aspects of guiding information for learning cluster-friendly information among modals. MSDIB mainly contains two parts: information compression and information preservation. In information compression, we extract from the private information of each modality to obtain the compact representation and meanwhile conduct mutual compression between them. In information preservation, the aim is to preserve the shared information among modals and the self-supervised information from the clustering results in each iteration. In the above process, there are mainly three aspects of self-guiding information, the modality-private information, the modality-shared information and the self-supervised pseudo label information. By minimizing the mutual information based objective function with a variational optimization method, we can fully extract useful discriminative information while eliminating the irrelevant parts. Extensive experimental results demonstrate that our method outperforms state-of-the-art multi-modal clustering methods, showcasing its superior performance and broad application prospects.