Integrating Stickers into Multimodal Dialogue Summarization: A Novel Dataset and Approach for Enhancing Social Media Interaction
Abstract: With the popularity of the internet and social media, growing number of online chats and comment replies are presented in the form of multimodal dialogues that contain stickers. Automatically summarizing these dialogues can effectively reduce content overload and save reading time. However, existing datasets and works are either unimodal text dialogue summarization, or articles with real photos that respectively perform text summaries and key image extraction, and have not simultaneously considered the multimodal dialogue automatic summarization tasks with sticker images and online chat scenarios. To compensate for the lack of datasets and researches in this field, we propose a brand-new Multimodal Chat Dialogue Summarization Containing Stickers (MCDSCS) task and dataset. It consists of 5,527 Chinese multimodal chat dialogues and 14,356 different sticker images, with each dialogue interspersed with stickers in the text to reflect the real social media chat scenario. MCDSCS can also contribute to filling the gap in Chinese multimodal dialogue data. We use the most advanced GPT4 model and carefully design Chain-of-Thoughts (COT) supplemented with manual review to generate dialogues and extract summaries. We also propose a novel method that integrates the visual information of stickers with the text descriptions of emotions and intentions (TEI). Experiments show that our method can effectively improve the performance of various mainstream summary generation models, even better than ChatGPT and some other multimodal models. Our data and code will be publicly available.
Primary Subject Area: [Engagement] Summarization, Analytics, and Storytelling
Secondary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: This work, based on social media scenarios, proposes a multimodal dialogue summary dataset with sticker images and offers a method to integrate sticker image information into text to improve summary performance. Both the dataset and method can promote the development of multimodal research on stickers in social media.
Supplementary Material: zip
Submission Number: 2051
Loading