A Unified Genetic and Epigenetic Model to Predict Breast Cancer Intrinsic Subtypes Using Large DNA-Level Multi-Omics Data and Hierarchical Learning

Xintong Chang, Jiemin Xie, Hongyu Duan, Keyi Li, Xuemei Liu, Yunhui Xiong, Xiangqi Bai, Kaida Ning, Li C. Xia

Published: 01 Nov 2025, Last Modified: 30 Jan 2026IEEE Transactions on Computational Biology and BioinformaticsEveryoneRevisionsCC BY-SA 4.0
Abstract: Breast cancer subtyping presents a significant clinical and scientific challenge. The prevalent expression-based Prediction Analysis of the Microarray of 50 genes (PAM50) system and its Immunohistochemistry (IHC) surrogate tests showed substantial inconsistencies and did not apply to the rapidly progressing circulating tumor DNA screenings. We developed Unified Genetic and Epigenetic Subtyping (UGES), a new intrinsic subtype classifier, by integrating large-scale DNA-level omics data with a hierarchy learning algorithm. Our benchmarks showed that both multi-step hierarchical learning and using all DNA-level alteration data are crucial, improving the overall AUC score by over 8.3% compared to the one-step multi-classification method. Based on these insights, we developed UGES, a three-step classifier based on 50831 DNA features of 2065 samples, including mutations, copy number aberrations, and methylations. UGES achieved an overall AUC score of 0.963 and greatly improved the clinical stratification of real-world patients, as each subtype strata's survival difference became statistically more significant, P = 9.7e-55 (UGES) vs. 2.2e-47 (PAM50). Finally, UGES identified 52 subtype-specific DNA biomarkers that can be targeted in early screening technology to expand the time window for precision care. The UGES code is freely available at https://github.com/labxscut/UGES.
Loading