TGCA-PVT: Topic-Guided Context-Aware Pyramid Vision Transformer for Sticker Emotion Recognition

Jian Chen; Wei Wang; Yuzhu Hu; Junxin Chen; Han Liu; Xiping Hu

TGCA-PVT: Topic-Guided Context-Aware Pyramid Vision Transformer for Sticker Emotion Recognition

Jian Chen, Wei Wang, Yuzhu Hu, Junxin Chen, Han Liu, Xiping Hu

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Online chatting has become an essential aspect of our daily interactions, with stickers emerging as a prevalent tool for conveying emotions more vividly than plain text. While conventional image emotion recognition focuses on global features, sticker emotion recognition necessitates incorporating both global and local features, along with additional modalities like text. To address this, we introduce a topic ID-guided transformer method to facilitate a more nuanced analysis of the stickers. Considering that each sticker will have a topic, and stickers with the same topic will have the same object, we introduce a topic ID and regard the stickers with the same topic ID as topic context. Our approach encompasses a novel topic-guided context-aware module and a topic-guided attention mechanism, enabling the extraction of comprehensive topic context features from stickers sharing the same topic ID, significantly enhancing emotion recognition accuracy. Moreover, we integrate a frequency linear attention module to leverage frequency domain information to better capture the object information of the stickers and a locally enhanced re-attention mechanism for improved local feature extraction. Extensive experiments and ablation studies on the large-scale sticker emotion dataset SER30k validate the efficacy of our method. Experimental results show that our proposed method obtains the best accuracy on both single-modal and multi-modal sticker emotion recognition.

Primary Subject Area: [Engagement] Emotional and Social Signals

Secondary Subject Area: [Content] Vision and Language

Relevance To Conference: In this work, we introduce the topic context of stickers to better capture the object information and even the local features of stickers with the same topic. We then designed a sticker emotion recognition model called TGCA-PVT based on topic context. Our contributions can be summarized as follows: a) We propose a TGCA-Module and a TG-Attention based on Topic ID to mine the subject information shared by emoticons with the same subject and the local enhancement features brought by the image transformation. b) We design a FLA-Module to better capture the frequency domain information for sticker object feature extraction and a LERA-Module to the ability of the proposed model to extract local details of stickers for better emotion recognition. c) Extensive experiments and ablation studies are conducted on the public large-scale sticker emotion recognition dataset SER30K and also image-emotion recognition dataset FI to evaluate our proposed method.

Supplementary Material: zip

Submission Number: 1895

Loading