CLICK-ID MULTI: A Multimodal Dataset for Indonesian Clickbait Detection and Benchmarking

ACL ARR 2025 May Submission1449 Authors

17 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Clickbait headlines attract user attention by exploiting curiosity gaps, often through sensational or misleading phrasing, while not necessarily conveying false information. Although clickbait contributes to the broader misinformation ecosystem, especially when amplified on social media, it remains underexplored in low-resource and multimodal settings. This paper introduces CLICK-ID MULTI, a new multimodal dataset for clickbait detection in Indonesian. It extends the original CLICK-ID dataset (William and Siri, 2020) by pairing 5,809 annotated news articles with associated images, enabling the development of multimodal models. Despite its smaller size compared to the original text-only dataset, CLICK-ID MULTI supports models that outperform the best text-only baseline (F1 = 0.7365), achieving F1 scores up to 0.937 through image-text fusion. These findings highlight the importance of multimodal learning and language-specific pretraining for robust clickbait detection in low-resource languages. The dataset and code are publicly available at: https://anonymous.4open.science/r/emnlp-2025-clickid-multi-8466.
Paper Type: Short
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: misinformation detection and analysis
Contribution Types: NLP engineering experiment
Languages Studied: Indonesian
Submission Number: 1449
Loading