Abstract: Fine-tuning pre-trained vision models for specific tasks is a common practice in computer vision. However, this process becomes more expensive and resource-intensive as models grow larger. Recently, parameter-efficient fine-tuning (PEFT) methods have emerged as a popular solution to improve training efficiency and reduce storage needs by tuning additional low-rank modules within pre-trained backbones. Despite their advantages, they struggle with limited representation capabilities and misalignment with pre-trained intermediate features. To address these issues, we introduce an innovative Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission (KARST) for various recognition tasks. Specifically, KARST’s multi-kernel design extends Kronecker projections horizontally and separates adaptation matrices into multiple complementary spaces, reducing parameter dependency and creating more compact subspaces. Besides, it incorporates extra learnable re-scaling factors to better align with pre-trained feature distributions, allowing for more flexible and balanced feature aggregation. Extensive experiments on diverse downstream datasets validate that our KARST not only outperforms other PEFT counterparts across model types and data domains, but also surpasses full fine-tuning with a negligible inference cost due to its re-parameterization characteristics.
External IDs:dblp:conf/icassp/0012DG0L25
Loading