CLIP-SMP: sentiment recognition using CLIP with Synchronous Multimodal Prompts

沈天阳

CLIP-SMP: sentiment recognition using CLIP with Synchronous Multimodal Prompts

沈天阳

28 Feb 2025 (modified: 01 Mar 2025)XJTU 2025 CSUC SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: sentiment recognition, CLIP, prompt, computer vision

TL;DR: a short paper showing a light-weight method to prompt frozen CLIP

Abstract: Sentiment recognition has always been a potential but challenging task in field of artificial intelligence. Especially with the zoom of the Internet of Thing(IoT) in recent years, the demand for high-resolution sentiment recognition is greatly growing. However, the absence of massive sentiment recognition datasets significantly obstacles its development. Recently Contrastive Language-Image Pre-training (CLIP) manifests us a new vision of compensating the lack of sentiment recognition datasets with the massive general knowledge contained in CLIP. But existing sentiment recognition models based on CLIP often only provide unimodal prompt individually or asynchronous prompts to CLIP, which might disrupt the balance of the multimodal structure in CLIP, finally impeding higher precision at sentiment recognition. In this paper, I propose CLIP-SMP, a sentiment recognition model using CLIP with lightweight synchronous multimodal prompts. Via experiments on two sentiment-recognition benchmarks, I prove the effectiveness and efficiency of CLIP-SMP, needing only 2.5M trainable parameters but reaching state-of-art.

Submission Number: 20

Loading