MAO: Efficient Model-Agnostic Optimization of Prompt Tuning for Vision-Language Models

Haoyang Li, Siyu Zhou, Liang Wang, Guodong Long

Published: 20 Mar 2025, Last Modified: 01 Apr 2025ICME 2025EveryoneCC BY 4.0

Abstract: Though CLIP-based prompt tuning significantly enhances pre-trained Vision-Language Models, existing research focuses on reconstructing the model architecture, e.g., additional loss calculation and meta-networks. These approaches generally lead to increased complexity and extended training cost. To maintain the efficiency of the tuning process, we propose plug-and-play **M**odel-**A**gnostic **O**ptimization (MAO) for prompt tuning. Without altering any components of the prompt tuning backbone, we introduce a Data-Driven Enhancement framework to optimize the distribution of the initial data, and incorporate an Alterable Regularization module to boost the task-specific feature processing pipeline, thereby improving overall performance while maintaining low computational cost. Extensive experiments on MAO demonstrate its outstanding performance and efficiency. The code of MAO is available at: https://github.com/JREion/M.A.O .