Deconfounded Emotion Guidance Sticker Selection with Causal Inference

Published: 01 Jan 2024, Last Modified: 19 Feb 2025ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the increasing popularity of online social applications, stickers have become common in online chats. Teaching a model to select the appropriate sticker from a set of candidate stickers based on dialogue context is important for optimizing the user experience. Existing methods have proposed leveraging emotional information to facilitate the selection of appropriate stickers. However, considering the frequent co-occurrence among sticker images, words with emotional preference in the dialogue and emotion labels, these methods tend to over-rely on such dataset bias, inducing spurious correlations during training. As a result, these methods may select inappropriate stickers that do not match users' intended expression. In this paper, we introduce a causal graph to explicitly identify the spurious correlations in the sticker selection task. Building upon the analysis, we propose a Causal Knowledge-Enhanced Sticker Selection model to mitigate spurious correlations. Specifically, we design a knowledge-enhanced emotional utterance extractor to identify emotional information within dialogues. Then an interventional visual feature extractor is employed to obtain unbiased visual features, aligning them with the emotional utterances representation. Finally, a standard transformer encoder fuses the multimodal information for emotion recognition and sticker selection. Extensive experiments on the MOD dataset show that our CKS model significantly outperforms the baseline models.
Loading