Parameter Efficient Fine-Tuning of Large Vision Foundational Models for Multi-Channel Medical Image Segmentation

11 Apr 2025 (modified: 01 May 2025)Submitted to MIDL 2025 - Short PapersEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Low-Rank Adaptation (LoRA), Multi-Channel Data, Finetuning, Medical Image Segmentation, Vision Transformers (ViT), UNetR
TL;DR: We propose integrating channel-wise LoRA adapters into the UNetR framework, efficiently adapting single-channel pre-trained Vision Transformers for multi-channel medical image segmentation, reducing trainable parameters by ~40%.
Abstract: Multi-channel data in medial imaging where each modality encodes distinct and complementary information is critical for accurate 3D segmentation. The UNetR architecture has demonstrated success in 3D medical image segmentation by integrating transformer-based encoder with a convolutional decoder. However, full fine-tuning of UNetR for new multi-channel tasks is computationally expensive and prone to over-fitting, especially with limited data and large transformer backbones. Moreover conventional transformer models, such as Vision Transformers are typically pre-trained on single channel images, limiting their direct applicability in multi-modal imaging tasks. To address this, we propose a parameter-efficient fine-tuning strategy using channel-wise Low-Rank Adaptation adapters within the UNetR encoder framework, enabling scalable multi-channel adaptation with reduced parameter overhead.
Submission Number: 69
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview