FAME: Fusion-Aware Multi-modal Ensemble for Social Media Popularity Prediction

Yan Zhuang, Wei Bai, Yanru Zhang, Minhao Liu, Jiawen Deng, Fuji Ren

Published: 27 Oct 2025, Last Modified: 23 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: As social media becomes a dominant platform for sharing content, predicting the popularity of user posts has become increasingly important for applications such as content recommendation, trend forecasting, and user engagement. However, this task is challenging due to the diverse and multimodal nature of social media posts, which often include unstructured text, images, and structured metadata. To address this challenge, we propose Fusion-Aware Multi-modal Ensemble (FAME), a framework effectively captures and integrates diverse information sources within social media content. Unlike prior approaches that rely on a single model to process all modalities, FAME leverages four specialized predictors. Three of them-CatBoost, LightGBM, and AutoGluon-are tree-based models that excel at handling structured metadata and its interactions with unstructured features. The fourth is a denoising autoencoder (DAE), which learns robust joint representations from unstructured text and image data. These models are combined through a weighted ensemble strategy, allowing FAME to leverage the complementary strengths of different architectures. Experiments on the Social Media Prediction Dataset demonstrate that FAME significantly outperforms existing baselines, achieving state-of-the-art results and validating its effectiveness in modeling the complex, multimodal nature of social media content.

External IDs:doi:10.1145/3746027.3763759