Robustness of Multimodal Foundation-Model Forecasting for Postoperative Cancer Outcomes

KuanTing Wu

Robustness of Multimodal Foundation-Model Forecasting for Postoperative Cancer Outcomes

KuanTing Wu

Published: 11 Jun 2026, Last Modified: 11 Jun 2026Forecast@ICML26 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI forecasting; multimodal foundation models; medical imaging; computational pathology; computed tomography; whole-slide imaging; survival analysis; postoperative recurrence prediction

TL;DR: Frozen CT-WSI foundation embeddings enable feasible postoperative cancer forecasting, but their value is most defensible as clinically anchored incremental signal rather than as a standalone replacement for clinical risk models.

Abstract: Postoperative outcome forecasting is a stringent test of whether frozen medical foundation-model embeddings can support clinically meaningful intelligence: predictions must remain useful under censoring, limited event counts, and comparison with established clinical anchors. We study two-year disease-free survival forecasting in an anonymized in-house resected NSCLC/LUAD cohort with paired preoperative computed tomography (CT) and postoperative hematoxylin-and-eosin whole-slide images (WSI). Using frozen patient-level embeddings, we evaluate a complete 2 x 2 matrix of CT foundation models (Pillar-0, CT-FM) and pathology foundation models (TITAN, Prov-GigaPath), together with WSI-only, CT-only, and score-level fusion models. Simple late averaging is the most stable fusion rule across all model combinations; the strongest TITAN plus Pillar-0 setting reaches a C-index of 0.799 and AUROC of 0.810, improving over its matched WSI-only baseline. However, a stage-only clinical anchor reaches a C-index of 0.837 and AUROC of 0.840 in the same task, and an exploratory TCGA-KIRC stress test similarly favors clinical/Leibovich-like baselines over frozen image embeddings. These results support a clinically anchored view of multimodal foundation embeddings: they are scalable forecasting substrates, but their value should be judged by incremental benefit, calibration, and robustness rather than by standalone image-only performance.

Submission Number: 64

Loading