Abstract: Somatic tumors have a high-dimensional, sparse, and small sample size nature, making cancer subtype stratification based on somatic genomic data a challenge. Current methods for improving cancer clustering performance focus on dimension reduction, integrating multi-omics data, or generating realistic samples, yet ignore the associations between mutated genes within the patient-gene matrix. We refer to these associations as gene mutation structural information, which implicitly includes cancer subtype information and can enhance subtype clustering. We introduce a novel method for cancer subtype clustering called SIG(Structural Information within Graph). As cancer is driven by a combination of genes, we establish associations between mutated genes within the same patient sample, pair by pair, and use a graph to represent them. An association between two mutated genes corresponds to an edge in the graph. We then merge these associations among all mutated genes to obtain a structural information graph, which enriches the gene network and improves its relevance to cancer clustering. We integrate the somatic tumor genome with the enriched gene network and propagate it to cluster patients with mutations in similar network regions. Our method achieves superior clustering performance compared to SOTA methods, as demonstrated by clustering experiments on ovarian and LUAD datasets.
Loading