A Time-Series Vision–Language Model for Predicting Progression of Diabetic Retinopathy

Gongyu Zhang; YT; Jiayu Huo; M. Jorge Cardoso; Timothy L Jackson; Christos Bergeles

A Time-Series Vision–Language Model for Predicting Progression of Diabetic Retinopathy

Gongyu Zhang, YT, Jiayu Huo, M. Jorge Cardoso, Timothy L Jackson, Christos Bergeles

Published: 23 Sept 2025, Last Modified: 01 Dec 2025TS4H NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: diabetic retinopathy, time series, vision–language model, multimodal learning, disease progression forecasting, clinical risk prediction, medical AI, interpretability

TL;DR: We propose a time-series vision–language model that combines retinal fundus images with structured clinical prompts to forecast diabetic retinopathy progression up to three years.

Abstract: Early detection of diabetic retinopathy (DR) progression is critical for timely intervention and prevention of vision loss. We present a time-series vision--language model that integrates longitudinal clinical context with retinal fundus images to forecast progression to referable DR at 1-, 2-, and 3-year horizons. The framework aligns fundus photographs with structured narrative prompts that encode demographics, diabetes history, and prior screening outcomes. Training is formulated as a contrastive objective, encouraging image embeddings to align with the correct horizon-specific outcome hypothesis. Using a national screening dataset of more than one million visits, we show that incorporating longitudinal information into the prompts consistently improves predictive performance, with the best one-year configuration achieving an AUROC of 0.707. The approach offers two key advantages: interpretability, by conditioning predictions on explicit clinical narratives, and extensibility, by allowing prompts to be adapted or enriched with additional timepoint information. To our knowledge, this is the first vision--language framework for horizon-specific DR forecasting, establishing a simple and reproducible baseline for adaptive recall scheduling, triage, and population-level risk management in DR screening programmes.

Submission Number: 10

Loading