Exploring Visual Prompt Tuning for Demographic Adaptation in Foundation Models for Medical Imaging

Published: 10 Oct 2024, Last Modified: 19 Nov 2024AFM 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Learning, Classification, Medical Imaging, Prompt Tuning, Foundation Models, Transfer Learning
TL;DR: Using 3 different adaptation strategies to train the foundation model pre-trained on the medical images, including adapting on specific demographics.
Abstract: Pre-trained medical foundation models are large, and they require significant computational resources for training. Visual Prompt Tuning (VPT) allows foundation models to efficiently adapt to new tasks with minimal changes to the model's architecture, reducing the need for extensive fine-tuning. Here, we explore demographic (race) adaptation of foundation models (MAE and MoCoV3) for disease classification in medical imaging using naturally imbalanced data. We compare three adaptation strategies: linear probing, full fine-tuning, and VPT. We find that VPT obtains a clear boost in performance, starting with prompt length 5 over linear probing. In the case of race demographics (e.g. Asian with 5.7\% of the full dataset), a VPT model trained on a demographic (Asian) performed similarly to a fully fine-tuned model trained on same dateset. A fully fine-tuned foundation model on a diverse and large dataset performs better than a model adapted only for a specific subset of data. However, it needs large data and computing resources, which may not always be available. These findings show that VPT can efficiently adapt foundation models for small datasets, achieving performance comparable to full fine-tuning.
Submission Number: 123
Loading