Large Language Models Are Natural Video Popularity Predictors

ACL ARR 2025 February Submission1095 Authors

12 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Predicting video popularity is often framed as a supervised learning task, relying heavily on meta-information and aggregated engagement data. However, video popularity is shaped by complex cultural and social factors that such approaches often overlook. We argue that Large Language Models (LLMs), with their deep contextual awareness, can better capture these nuances. To bridge the gap between pixel-based video data and token-based LLMs, we convert frame-level visuals into sequential text representations using Vision-Language Models. This enables LLMs to process multimodal content—titles, frame-based descriptions, and captions—capturing both engagement intensity (view count) and geographic spread (number of countries where a video trends). On 13,639 popular videos, a supervised neural network using content embeddings achieves 80\% accuracy, while our LLM-based approach reaches 82\% without fine-tuning. Combining the neural network's predictions with the LLM further improves accuracy to 85.5\%. Moreover, the LLM generates interpretable, theory-grounded explanations for its predictions. Manual validations confirm the quality of these hypotheses and address concerns about hallucinations in the video-to-text conversion process. Overall, our findings suggest that LLMs, equipped with text-based multimodal representations, offer a powerful, interpretable, and data-efficient solution for tasks requiring rich contextual insight, such as video popularity prediction.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: Large Language Models (LLMs), Vision-Language Models (VLMs), Multimodal Textual Representations, Video Popularity Prediction
Contribution Types: Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 1095
Loading