To the Best of Trust: Full-Stage Trusted Multi-modal Clustering

Shizhe Hu; Jiahao Fan; Yucong Wu; Jinlan Wang; Jin Qin; Xiaoheng Jiang; Pei Lv; Mingliang Xu

To the Best of Trust: Full-Stage Trusted Multi-modal Clustering

Shizhe Hu, Jiahao Fan, Yucong Wu, Jinlan Wang, Jin Qin, Xiaoheng Jiang, Pei Lv, Mingliang Xu

16 Sept 2025 (modified: 24 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-modal Clustering, Trusted Learning

TL;DR: we propose a novel Full Stage Trusted Multi-modal Clustering (FSTMC) method.

Abstract: Multi-modal clustering (MMC) aims to integrate complementary information from different modalities to uncover latent consistent structures and improve clustering performance.However, existing methods mainly rely on predictive(result) uncertainty to improve robustness, while often neglecting aleatoric(data) uncertainty introduced by sample noise and epistemic(model) uncertainty induced by model parameters and structural variations.To this end, we propose a novel Full-Stage Trusted Multi-modal Clustering (FSTMC) method. To the best of trust, we jointly utilize aleatoric, epistemic, and predictive uncertainties to optimize the model, learn more reliable feature representations, and obtain more reliable clustering results. In the representation learning stage, probabilistic modeling is employed to capture stable latent representations that account for aleatoric uncertainty, while structured stochastic perturbations are introduced to estimate epistemic uncertainty. In the clustering stage, we replace conventional feature-level fusion with an evidence-based strategy: soft labels from each modality are mapped into categorical evidence, class distributions are parameterized via a Dirichlet model, and dynamic cross-modal fusion is achieved through Dempster–Shafer theory. To mitigate overconfidence and modal conflicts, prior constraints guided by aleatoric and epistemic uncertainty are imposed, resulting in calibrated predictive uncertainty. Finally, we exploit predictive uncertainty to selectively incorporate pseudo labels for optimization, forming a virtuous cycle. Benchmark experiments on a large number of multi-modal datasets demonstrate that our approach significantly improves credibility and accuracy compared to state-of-the-art methods.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 7333

Loading