Frequency Strikes Back: Boosting Parameter-Efficient Foundation Model Adaptation for Medical Imaging

Son Thai Ly, Hien Van Nguyen

Published: 23 Sept 2025, Last Modified: 16 Nov 2025MICCAIEveryoneCC BY 4.0

Abstract: Adapting vision transformer (ViT) foundation models with parameter-efficient fine-tuning (PEFT) has become increasingly popular in medical imaging, enabling efficient adaptation while updating only a small subset of parameters. However, existing PEFT methods process tokens independently, overlooking cross-token dependencies and limiting their ability to capture global contextual information. To address these challenges, we propose FreqFiT, a novel Frequency-based Fine-Tuning module inserted between ViT blocks to enhance model adaptability. FreqFiT is effective and seamlessly integrates with existing PEFT methods to improve their performance. We evaluate FreqFiT across 2D and 3D medical imaging datasets, such as PAPILA, HAM10000, ADNI-1.5T, and COVID-CT-MD. It improves accuracy 9% and AUC 10%, surpassing the original PEFT methods on both MedMAE and DINOv2 backbones. Despite using only ≤ 1.2% of full fine-tuning parameters, FreqFiT achieves state-of-the-art medical imaging adaptation efficiently. The source code is available here.