GI-Clust: Deep Clustering for Early Gastrointestinal Cancer Detection

Published: 23 Sept 2025, Last Modified: 18 Oct 2025TS4H NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Predictive Clustering, Time-series Modelling, Interpretable Representation Learning, Phenotype Discovery / Risk Stratification, Early GI cancer detection, Modelling Primary Care Data
TL;DR: GI-Clust is a novel deep clustering model that learns from sparse multivariate primary care EHR time series to improve early GI cancer detection and identify interpretable phenotypes.
Abstract: Early diagnosis of gastrointestinal (GI) cancers remains challenging due to non-specific symptom presentation and the limitations of existing risk stratification tools in primary care. Current models are predominantly static, failing to capture how patient trajectories evolve within electronic health records (EHRs). We present GI-Clust, a predictive deep clustering framework designed for irregular, multivariate EHR time series. GI-Clust employs a dual-encoder architecture: an LSTM-based attention encoder for temporal features with an integrated interpretability framework, and a lightweight MLP encoder for baseline risk factors, fused via a gated mechanism. Latent embeddings are clustered using a Gumbel-Softmax layer, enabling differentiable optimisation. The framework jointly optimises prediction and clustering objectives to uncover clinically interpretable patient subgroups. Evaluated on 210,970 UK primary care patients from the QResearch database, GI-Clust outperforms strong baselines, including XGBoost, LSTM-Encoder, and CAMELOT, achieving AU-ROC 0.870 and F1 0.380, while identifying phenotype-specific feature–time dependencies (e.g., haemoglobin in the six months prior to diagnosis across GI cancer subtypes). Crucially, the model generalises well to geographically distinct test regions, demonstrating robustness. To our knowledge, this is the first predictive clustering approach applied to longitudinal UK primary care data for cancer detection.
Submission Number: 41
Loading