Beyond Fine-Tuning: The Present and Future of Parameter-Efficient Fine-Tuning in Vision Transformers

Published: 17 Jan 2026, Last Modified: 04 Feb 2026TIME 2026 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: parameter-efficient, fine-tuning, vision transformers, pre-training
Abstract: The advent of Vision Transformers (ViTs) has significantly advanced computer vision, particularly image classification, through large-scale pretraining on massive datasets. However, the high computational and memory costs of full fine-tuning remain a major bottleneck for practical deployment across diverse downstream tasks. Parameter-Efficient Fine-Tuning (PEFT) has emerged as a promising paradigm to address this challenge by adapting models through updates to only a small subset of parameters while preserving the benefits of pretrained representations. In this survey, we present a focused review of PEFT techniques for ViTs in image classification. We introduce a structured taxonomy that categorizes existing approaches into additive-based, reparameterization-based, selective, hybrid, and inference-efficient tuning methods, highlighting their core design principles, strengths, and limitations. We further analyze evaluation protocols, benchmark results, and trade-offs between accuracy, efficiency, and scalability. Finally, we identify open challenges and outline promising directions for future research toward more robust, efficient, and deployable PEFT frameworks for vision.
Submission Number: 15
Loading