Learning to Prompt for Vision-Language Emotion Recognition

Published: 01 Jan 2023, Last Modified: 13 Nov 2024ACIIW 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In the realm of vision-language tasks, the selection of prompts plays a crucial role in determining model performance, particularly in complex tasks such as emotion recognition. Despite the promise shown by models like CLIP and CoOp, their performance can exhibit significant variability, contingent upon the selection and adaptability of prompts. Addressing this challenge, this paper introduces an innovative method that exploits the philosophy of learning to learn. This novel approach facilitates the design of an emotion recognition model capable of dynamically optimizing prompt selection according to the specific demands of a given task. We empirically demonstrate that our approach outperforms established models such as CLIP and CoOp in both few-shot and zero-shot settings across three datasets, indicating its potential to enhance the generalization and adaptation capabilities of vision-language emotion recognition models.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview