In-Context Learning for Data-Efficient Diabetic Retinopathy Detection via Multimodal Foundation Models
Abstract: Objective: This study aims to evaluate whether in-context learning (ICL), a prompt-based learning mechanism enabling multimodal foundation models to rapidly adapt to new tasks without retraining or large annotated datasets, can achieve comparable diagnostic performance to domain-specific foundation models. Specifically, we use diabetic retinopathy (DR) detection as an exemplar task to probe if a multimodal foundation model (Google Gemini 1.5 Pro), employing ICL, can match the performance of a domain-specific model (RETFound) fine-tuned explicitly for DR detection from color fundus photographs (CFPs).
Design: A cross-sectional study.
Subjects: A retrospective, publicly available dataset (Indian Diabetic Retinopathy Image Dataset) comprising 516 CFPs collected at an eye clinic in India, featuring both healthy individuals and patients with DR.
Methods: The images were dichotomized into 2 groups based on the presence or absence of any signs of DR. RETFound was fine-tuned for this binary classification task, while Gemini 1.5 Pro was assessed for it under zero-shot and few-shot prompting scenarios, with the latter incorporating random or k-nearest-neighbors-based sampling of a varying number of example images. For experiments, data were partitioned into training, validation, and test sets in a stratified manner, with the process repeated for 10-fold cross-validation.
Main Outcome Measures: Performance was assessed via accuracy, F1 score, and expected calibration error of predictive probabilities. Statistical significance was evaluated using Wilcoxon tests.
Results: The best ICL performance with Gemini 1.5 Pro yielded an average accuracy of 0.841 (95% confidence interval [CI]: 0.803—0.879), an F1 score of 0.876 (95% CI: 0.844—0.909), and a calibration error of 0.129 (95% CI: 0.107—0.152). RETFound achieved an average accuracy of 0.849 (95% CI: 0.813—0.885), an F1 score of 0.883 (95% CI: 0.852—0.915), and a calibration error of 0.081 (95% CI: 0.066—0.097). While accuracy and F1 scores were comparable (P > 0.3), RETFound’s calibration was superior (P = 0.004).
Conclusions: Gemini 1.5 Pro with ICL demonstrated performance comparable to RETFound for binary DR detection, illustrating how future medical artificial intelligence systems may build upon such frontier models rather than being bespoke solutions.
Loading