Keywords: Contrastive Learning, Multimodal Representation Learning, Biocomputation, scRNA-seq, Cancer
TL;DR: We present a weakly supervised contrastive learning framework that aligns scRNA-seq profiles with CNV-derived subclusters as genomic anchors.
Abstract: Single-cell RNA sequencing (scRNA-seq) provides powerful resolution into cellular heterogeneity, yet expression profiles alone often reflect lineage and housekeeping signals more strongly than tumor-intrinsic alterations. To address this, we present a weakly supervised contrastive learning framework that aligns scRNA-seq profiles with copy number variation (CNV) subclusters inferred from the same data. CNV embeddings are treated as fixed anchors, while a gene expression encoder is optimized to align with them in a shared latent space using a combination of contrastive, centroid, and intermediate (h-space) alignment losses. In a proof-of-concept analysis of a lung adenocarcinoma sample, the learned representation achieved 97.4% top-5 retrieval accuracy of CNV anchors from expression centroids in latent space. The aligned embeddings enabled biologically meaningful downstream analysis, including differential expression between malignant and normal epithelial cells, which identified candidate biomarkers. These results demonstrate that weak anchor guidance can ground scRNA-seq embeddings in genomic structure. While limited to a single patient, this work highlights the potential of multimodal contrastive learning to integrate inferred genomic and transcriptomic signals when only scRNA-seq is available.
Submission Number: 77
Loading