Abstract: Partially View-aligned Clustering (PVC) presents a challenge as it requires a comprehensive exploration of complementary and consistent information in the presence of partial alignment of view data. Existing PVC methods typically learn view correspondence based on latent features that are expected to contain common semantic information. However, latent features obtained from heterogeneous spaces, along with the enforcement of alignment into the same feature dimension, can introduce cross-view discrepancies. In particular, partially view-aligned data lacks sufficient shared correspondences for the critical common semantic feature learning, resulting in inaccuracies in establishing meaningful correspondences between latent features across different views. While feature representations may differ across views, instance relationships within each view could potentially encode consistent common semantics across views. Motivated by this, our aim is to learn view correspondence based on graph distribution metrics that capture semantic view-invariant instance relationships. To achieve this, we utilize similarity graphs to depict instance relationships and learn view correspondence by aligning semantic similarity graphs through optimal transport with graph distribution. This facilitates the precise learning of view alignments, even in the presence of heterogeneous view-specific feature distortions. Furthermore, leveraging well-established cross-view correspondence, we introduce a cross-view contrastive learning to learn semantic features by exploiting consistency information. The resulting meaningful semantic features effectively isolate shared latent patterns, avoiding the inclusion of irrelevant private information. We conduct extensive experiments on several real datasets, demonstrating the effectiveness of our proposed method for the PVC task.
Primary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: In multimedia applications, datasets often contain diverse feature representations, such as images, text, and videos, for each sample. These various representations are collectively known as multi-view data. Multi-View Clustering (MVC) is a technique that aims to improve performance by utilizing the inherent consistency and complementary attributes within multi-view data. However, the effectiveness of existing MVC methods relies on the idealized assumption that every view is perfectly aligned. In practice, this assumption can be easily violated during imperfect data collection, where only a subset of samples exhibit alignment across views. Therefore, there is a critical need for models that can precisely learn view correspondences, such that consistent and complementary information can be exploited within partially aligned multi-view data. This work proposes a novel approach to tackle the practical challenge of Partially View-aligned Clustering (PVC).
Supplementary Material: zip
Submission Number: 2316
Loading