TL;DR: We in this paper give a new peer-review look on the multi-modal clustering problem and propose to iteratively treat one modality as ``author" and the remaining modalities as ``reviewers" so as to reach a peer-review score for each modality.
Abstract: Despite the superior capability in complementary information exploration and consistent clustering structure learning, most current weight-based multi-modal clustering methods still contain three limitations: 1) lack of trustworthiness in learned weights; 2) isolated view weight learning; 3) extra weight parameters. Motivated by the peer-review mechanism in the academia, we in this paper give a new peer-review look on the multi-modal clustering problem and propose to iteratively treat one modality as "author" and the remaining modalities as "reviewers" so as to reach a peer-review score for each modality. It essentially explores the underlying relationships among modalities. To improve the trustworthiness, we further design a new trustworthy score with a self-supervision working mechanism. Following that, we propose a novel Peer-review Trustworthy Information Bottleneck (PTIB) method for weighted multi-modal clustering, where both the above scores are simultaneously taken into account for accurate and parameter-free modality weight learning. Extensive experiments on eight multi-modal datasets suggest that PTIB can outperform the state-of-the-art multi-modal clustering methods.
Lay Summary: We want to know how much help the multiple modalities (e.g., images, text, etc.) of samples can provide for classifying them without true labels. Most existing methods often let a modality independently claim how much useful information it has to answer this question. But it is usually better for multiple people to discuss with each other than to ponder alone. Analogous to the peer review mechanism in academia, we regard one modality as the "author" and the other modalities as the "reviewers" to score the "author". And to ensure the fairness and reliability of this process, we regard the joint classification results of all modalities as the editor-in-chief or associate editor for judging whether the reviewer's score is reasonable. Finally, by selectively considering the scores of all "reviewers", the contribution of an "author" modality will be reasonably determined. We obtained better results than many existing methods. Our research provides a new possible idea for integrating multi-source information.
Primary Area: General Machine Learning->Clustering
Keywords: Multi-modal clustering, Information bottleneck
Submission Number: 2808
Loading