Keywords: sentiment recognition, CLIP, prompt, computer vision
TL;DR: a short paper showing a light-weight method to prompt frozen CLIP
Abstract: Sentiment recognition has always been a potential but challenging task in field of artificial intelligence. Especially with the zoom of the Internet of Thing(IoT) in recent years, the demand for high-resolution sentiment recognition is greatly growing. However, the absence of massive sentiment recognition datasets significantly obstacles its development. Recently Contrastive Language-Image Pre-training (CLIP) manifests us a new vision of compensating the lack of sentiment recognition datasets with the massive general knowledge contained in CLIP. But existing sentiment recognition models based on CLIP often only provide unimodal prompt individually or asynchronous prompts to CLIP, which might disrupt the balance of the multimodal structure in CLIP, finally impeding higher precision at sentiment recognition. In this paper, I propose CLIP-SMP, a sentiment recognition model using CLIP with lightweight synchronous multimodal prompts. Via experiments on two sentiment-recognition benchmarks, I prove the effectiveness and efficiency of CLIP-SMP, needing only 2.5M trainable parameters but reaching state-of-art.
Submission Number: 20
Loading