CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: CoCoA-Mix enhances specialization and generalization in prompt tuning using CoA-loss for refined decision boundaries and CoA-temp for confidence-based scaling.
Abstract: Prompt tuning, which adapts vision-language models by freezing model parameters and opti- mizing only the prompt, has proven effective for task-specific adaptations. The core challenge in prompt tuning is improving specialization for a specific task and generalization for unseen domains. However, frozen encoders often produce misaligned features, leading to confusion between classes and limiting specialization. To overcome this issue, we propose a confusion-aware loss (CoA-loss) that improves specialization by refining the decision boundaries between confusing classes. Additionally, we mathematically demonstrate that a mixture model can enhance generalization without compromising specialization. This is achieved using confidence-aware weights (CoA- weights), which adjust the weights of each prediction in the mixture model based on its confidence within the class domains. Extensive experiments show that CoCoA-Mix, a mixture model with CoA-loss and CoA-weights, outperforms state-of-the-art methods by enhancing specialization and generalization. Our code is publicly available at https://github.com/url-kaist/CoCoA-Mix
Lay Summary: Vision-language models are powerful, but adapting them to new tasks without retraining the entire model remains a challenge. Prompt tuning, which changes only the prompts the model receives, is efficient but often causes confusion between similar categories. We propose CoCoA-Mix, a method based on a mixture model that combines results from multiple prompts. It includes a confusion-aware loss (CoA-loss) to help the model avoid confusing similar categories, and confidence-aware weights (CoA-weights) that adjust each prediction based on how confident a prompt is for its class. CoCoA-Mix improves specialization and generalization over existing methods. Our code is publicly available for others to use and build upon.
Link To Code: https://github.com/url-kaist/CoCoA-Mix
Primary Area: Applications->Computer Vision
Keywords: Context optimization, Prompt tuning, Vision-language models
Submission Number: 47
Loading