TL;DR: Extended parameter efficient pre-training enables ViTs such as DinoV2 and MAE to effectively and efficiently transfer to different visual domains (i.e. satellite, medical imagery)
Abstract: Parameter-efficient fine-tuning (PEFT) techniques such as low-rank adaptation (LoRA) can effectively adapt large pre-trained foundation models to downstream tasks using only a small fraction (0.1%-10%) of the original trainable weights. An under-explored question of PEFT is in extending the pre-training phase without supervised labels; that is, can we adapt a pre-trained foundation model to a new domain via efficient self-supervised pre-training on this domain? In this work, we introduce ExPLoRA, a highly effective technique to improve transfer learning of pre-trained vision transformers (ViTs) under domain shifts. Initializing a ViT with pre-trained weights on large, natural-image datasets such as from DinoV2 or MAE, ExPLoRA continues the unsupervised pre-training objective on a new domain, unfreezing 1-2 pre-trained ViT blocks and tuning all other layers with LoRA. We then fine-tune the resulting model only with LoRA on this new domain for supervised learning. Our experiments demonstrate state-of-the-art results on satellite imagery, even outperforming fully pre-training and fine-tuning ViTs. Using the DinoV2 training objective, we demonstrate up to 8% improvement in linear probing top-1 accuracy on downstream tasks while using <10% of the number of parameters that are used in prior fully-tuned state-of-the art approaches. Our ablation studies confirm the efficacy of our approach over other baselines such as PEFT. Code is available at: https://samar-khanna.github.io/ExPLoRA/
Lay Summary: Training powerful AI models to analyze specialized images—like satellite photos for environmental monitoring or medical scans for disease detection—requires enormous computational resources that most researchers cannot afford. Current AI models trained on everyday photos perform poorly on these specialized domains, forcing scientists to build entirely new models at costs exceeding thousands of hours on specialized computer hardware (eg: GPUs) and significant environmental impact.
We developed ExPLoRA, an efficient method that takes existing AI models trained on natural images and adapts them to new domains by updating only a small fraction of the model's components. Instead of building new models from scratch, our approach lets the model continue to learn patterns from specialized images on its own—without needing humans to label examples—while preserving most of the original knowledge the model learned from natural, everyday photos.
Our method achieves better performance than training models from scratch while using 10 times less computing power and producing 8 times fewer carbon emissions. This makes cutting-edge AI accessible to researchers with limited resources, enabling more scientists to develop tools for climate monitoring, medical diagnosis, and agricultural assessment.
Link To Code: https://samar-khanna.github.io/ExPLoRA/
Primary Area: Deep Learning->Self-Supervised Learning
Keywords: lora, PEFT, parameter-efficient finetuning, parameter-efficient pre-training, vision transformer, ViT, domain adaptation, domain generalization, satellite images, foundation models
Submission Number: 9062
Loading