Keywords: Multi-view learning, multi-modal learning, variational autoencoders, tensor computation
Abstract: Multi-view or multi-modal learning, in theory, should enhance clustering results by leveraging information from other modalities. However, it is commonly observed that incorporating more modalities or views does not necessarily improve clustering performance due to the lack of proper information alignment and the reliance on oversimplified techniques for handling missing data.
To address these limitations, we propose LAMVC, a Low-rank tensor-steered variational Autoencoder for incomplete Multi-View Clustering. LAMVC learns a similarity matrix for each view via a reformulated Kullback-Leibler (KL) divergence in latent space, capturing accurate sample relationships even with missing views. These matrices are then normalized and stacked into a tensor, which is decomposed into view-commonality and view-specificity tensors.
We apply LAMVC to protein fold clustering with multi-modal information including sequence alignments and structural predictions. Furthermore, we employ a novel loss function that enforces precise data reconstruction and captures meaningful distributional relationships through optimized decomposed tensors, eliminating the need for predefined distributions.
Experimental results demonstrate that LAMVC significantly outperforms existing IMVC models across multiple datasets, with additional validation through robustness and ablation studies.
Submission Number: 23
Loading