Bifurcate then Alienate: Incomplete Multi-view Clustering via Coupled Distribution Learning with Linear Overhead

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Despite remarkable advances, existing incomplete multi-view clustering (IMC) methods typically leverage either perspective-shared or perspective-specific determinants to encode cluster representations. To address this limitation, we introduce a BACDL algorithm designed to explicitly capture both concurrently, thereby exploiting heterogeneous data more effectively. It chooses to bifurcate feature clusters and further alienate them to enlarge the discrimination. With distribution learning, it successfully couples view guidance into feature clusters to alleviate dimension inconsistency. Then, building on the principle that samples in one common cluster own similar marginal distribution and conditional distribution, it unifies the association between feature clusters and sample clusters to bridge all views. Thereafter, all incomplete sample clusters are reordered and mapped to a common one to formulate clustering embedding. Last, the overall linear overhead endows it with a resource-efficient characteristic.
Lay Summary: Current methods for clustering incomplete multi-source data (like combining medical scans from different machines) have a key limitation - they only look for either similarities across all sources or unique patterns in individual sources. Our new BACDL algorithm solves this by: 1. Simultaneously identifying both shared patterns and source-specific features 2. Carefully separating feature groups to make clearer distinctions between clusters 3. Using smart distribution matching to align data from different sources The system then reorganizes all the partial data into a unified clustering structure. Importantly, it does this efficiently without requiring heavy computations. This approach is particularly useful for real-world situations where: 1. Some data might be missing (like a patient missing one type of scan) 2. Different sources provide different types of information 3. You need to combine various data types while maintaining accuracy
Primary Area: General Machine Learning
Keywords: Incomplete Mulit-view Clustering
Submission Number: 888
Loading