Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis

Published: 01 Jan 2024, Last Modified: 04 Nov 2024MICCAI (5) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The success of Large Vision Models (LVMs) is accompanied by vast data volumes, which are prohibitively expensive in medical diagnosis. To address this, recent efforts exploit Parameter-Efficient Fine-Tuning (PEFT), which trains a small number of weights while freezing the rest for knowledge transfer. However, they typically assign trainable weights to the same positions in LVMs in a heuristic manner, regardless of task differences, making them suboptimal for professional applications like medical diagnosis. To address this, we statistically reveal the nature of sparsity and hybridity during diagnostic-targeted fine-tuning, i.e., a small portion of key weights significantly impacts performance, and these key weights are hybrid, including both task-specific and task-agnostic parts. Based on this, we propose a novel Sparsity- and Hybridity-inspired Parameter Efficient Fine-Tuning (SH-PEFT). It selects and trains a small portion of weights based on their importance, which is innovatively estimated by hybridizing both task-specific and task-agnostic strategies. Validated on six medical datasets of different modalities, we demonstrate that SH-PEFT achieves state-of-the-art performance in transferring LVMs to medical diagnosis in terms of accuracy. By tuning around 0.01\(\%\) number of weights, it outperforms full model fine-tuning. Moreover, SH-PEFT also performs comparably to other models deliberately optimized for specific medical tasks. Extensive experiments demonstrate the effectiveness of each design and reveal the great potential of pre-trained LVM transfer for medical diagnosis.
Loading