Abstract: Sparse and noisy images (SNIs), like those in spatial gene expression data, pose significant challenges for effective representation learning and clustering, which are essential for thorough data analysis and interpretation. In response to these challenges, we propose $\textbf{D}$ual $\textbf{A}$dvancement of $\textbf{R}$epresentation $\textbf{L}$earning and $\textbf{C}$lustering ($\textit{\textbf{DARLC}}$), an innovative framework that leverages contrastive learning to enhance the representations derived from masked image modeling. Simultaneously, $\textit{DARLC}$ integrates cluster assignments in a cohesive, end-to-end approach. This integrated clustering strategy addresses the ``class collision problem'' inherent in contrastive learning, thus improving the quality of the resulting representations. To generate more plausible positive views for contrastive learning, we employ a graph attention network-based technique that produces denoised images as augmented data. As such, our framework offers a comprehensive approach that improves the learning of representations by enhancing their local perceptibility, distinctiveness, and the understanding of relational semantics. Furthermore, we utilize a Student's t mixture model to achieve more robust and adaptable clustering of SNIs. Extensive evaluation on 12 real-world datasets of SNIs, representing spatial gene expressions, demonstrat $\textit{DARLC}$'s superiority over current state-of-the-art methods in both image clustering and generating representations that accurately reflect biosemantics content and gene interactions.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: The Dual Advancement of Representation Learning and Clustering (DARLC) framework addresses the significant challenges posed by sparse and noisy images (SNIs) in multimedia domains such as medical imaging, aerial photography, and remote surveillance. By innovatively integrating contrastive learning with masked image modeling and employing a graph attention network for denoising, coupled with a Student's t mixture model for robust clustering, DARLC enhances both the distinctiveness and perceptibility of data representations. Extensive validation on real-world datasets, demonstrates DARLC's superiority in improving representation learning and clustering, significantly outperforming state-of-the-art methods.
Supplementary Material: zip
Submission Number: 3772
Loading