Abstract: Prompt learning has emerged as a valuable technique for enhancing vision-language models (VLMs) for downstream tasks in specific domains, resulting in high performance on such tasks. However, existing prompt learning methods place too much focus on increasing complexity, overlooking the natural issue of learning with limited data, which leads to overfitting. In this study, we address this issue by applying temperature scaling (TS) during training to improve confidence calibration. By sharpening predictions, TS reduces overfitting and enhances generalization across diverse datasets. Extensive experiments demonstrate that our approach improves the calibration of CLIP-based methods without compromising accuracy. Our method is model-agnostic and applicable to broader tasks, offering a simple yet effective solution for zero-shot and few-shot learning.
External IDs:dblp:journals/access/NguyenP25
Loading