Adaptive Clustering for EGFR Amplification Prediction in Glioblastoma: A Variational Autoencoder-Dirichlet Bayesian Gaussian Approach

Homay Danaei Mehr, Cong Cong, Imran Noorani, Antonio Di Ieva, Sidong Liu

Published: 01 Jan 2025, Last Modified: 04 Nov 2025AIME (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Glioblastoma (GBM) - an aggressive brain tumor- is notorious for its resistance to treatments due to its high heterogeneity and rapid growth. The epidermal growth factor receptor (EGFR) plays an important role in the diagnostic, prognostic, and therapeutic biomarkers of GBM. With advancements in digital pathology, deep learning models, especially Multiple Instance Learning (MIL)-based approaches, have achieved promising results in tumor classification. However, MIL models are often task-specific, constraining their generalizability. On the other hand, the morphological redundancy in tissue can be leveraged to provide task-agnostic slide representation in an unsupervised approach like the newly emerged morphological prototype-based PANTHER model. PANTHER could improve the classification performance; however, its K-Means clustering depends on a fixed and predefined number of prototypes, which may cause over or under-clustering, reducing the classification performance. To address this limitation, we proposed an adaptive Variational Autoencoder-Dirichlet Bayesian Gaussian Mixture Model (VAE-DBGMM) to learn optimal prototypes dynamically. Using the TCGA-GBM dataset with EGFR labeling, we evaluated our adaptive approach against the PANTHER model with predefined numbers of prototypes (8, 16, 18, 32) and three state-of-the-art MIL models (CLAM, TransMIL, and DTFD). The results demonstrate that the optimal prototypes derived from VAE-DBGMM significantly improved classification performance, achieving an AUC of 0.795 ± 0.0105, outperforming PANTHER and MIL baselines. Furthermore, testing on the external CPTAC-EGFR dataset demonstrates the robustness and generalizability of our approach. These findings emphasise the significance of adaptive clustering in improving EGFR biomarker classification in GBM.

External IDs:dblp:conf/aime/MehrCNIL25