Feature Construction for Posts and Users Combined with LightGBM for Social Media Popularity Prediction

Published: 01 Jan 2019, Last Modified: 20 Oct 2024ACM Multimedia 2019EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, we propose to address the Social Media Prediction (SMP) Challenge by using regression model with multiple features extracted from various aspects of posts. More specifically, we extract textual features, numeric features, and construct user-related features to this end. For textual features, the rich texts possessed by the posts are integrated to build a corpus, based on which we train a language model to learn the vector representation of semantic information. For numeric features, we construct several new features, including the length and the word numbers of title. For the user-related features, we design a "user id count" based on the number of times each user posted in the entire dataset to show the activity of the user. Finally, the multiple features are feed into LightGBM to predict popularity scores. Extensive experiments conducted on the Social Media Prediction Dataset show the superiority of our method. Our approach achieves the 3rd place in the SMP Challenge.
Loading