Engaging Live Video Comments Generation

Ge Luo; Yuchen Ma; Manman Zhang; Junqiang Huang; Sheng Li; Zhenxing Qian; Xinpeng Zhang

Engaging Live Video Comments Generation

Ge Luo, Yuchen Ma, Manman Zhang, Junqiang Huang, Sheng Li, Zhenxing Qian, Xinpeng Zhang

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Automatic live commenting is increasingly acknowledged as a crucial strategy for improving viewer interaction. However, current methods overlook the significance of creating engaging comments. Engaging comments can not only attract viewers' widespread attention, earning numerous "likes", but also further promote subsequent social comment interactions. In this paper, we introduce a novel framework for generating engaging live video comments, aiming to resonate with viewers and enhance the viewing experience. Then, we design a Competitive Context Selection Strategy to accelerate differential learning by constructing relatively attention sample pairs with different levels of attractiveness. This approach addresses the sample imbalance problem between highly-liked and low-liked comments, as well as the relative attractiveness issue of comments within video scenes. Moreover, we develop a Semantic Gap Contrastive Loss to minimize the distance between generated comments and higher-liked comments within the segment, while also widening the gap with lower-liked or unliked comments. This loss function helps the model to generate more engaging comments. To support our proposed generation task, we construct a video comment dataset with "like" information, containing 180,000 comments and their "like" counts. Extensive experiments indicate that the comments generated by our method are highly engaging, more fluent, natural, and diverse compared to baselines.

Primary Subject Area: [Content] Vision and Language

Secondary Subject Area: [Experience] Multimedia Applications

Relevance To Conference: Our video comment generation work is a generative task under the video scene conditions. In the video scene, there are multiple input information including multiple image frames, subtitle texts, context comment, etc., which can be understood as a type of work for generating comment text under multi-modal conditions.

Submission Number: 2980

Loading