Keywords: Multi-modal Clustering, Trusted Learning
TL;DR: we propose a novel Full Stage Trusted Multi-modal Clustering (FSTMC) method.
Abstract: Multi-modal clustering (MMC) aims to integrate complementary information from different modalities to uncover latent consistent structures and improve clustering performance.However, existing methods mainly rely on predictive(result) uncertainty to improve robustness, while often neglecting aleatoric(data) uncertainty introduced by sample noise and epistemic(model) uncertainty induced by model parameters and structural variations.To this end, we propose a novel Full-Stage Trusted Multi-modal Clustering (FSTMC) method. To the best of trust, we jointly utilize aleatoric, epistemic, and predictive uncertainties to optimize the model, learn more reliable feature representations, and obtain more reliable clustering results. In the representation learning stage, probabilistic modeling is employed to capture stable latent representations that account for aleatoric uncertainty, while structured stochastic perturbations are introduced to estimate epistemic uncertainty. In the clustering stage, we replace conventional feature-level fusion with an evidence-based strategy: soft labels from each modality are mapped into categorical evidence, class distributions are parameterized via a Dirichlet model, and dynamic cross-modal fusion is achieved through Dempster–Shafer theory. To mitigate overconfidence and modal conflicts, prior constraints guided by aleatoric and epistemic uncertainty are imposed, resulting in calibrated predictive uncertainty. Finally, we exploit predictive uncertainty to selectively incorporate pseudo labels for optimization, forming a virtuous cycle. Benchmark experiments on a large number of multi-modal datasets demonstrate that our approach significantly improves credibility and accuracy compared to state-of-the-art methods.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 7333
Loading