A simple contrastive embedding framework for low-resource fake news detection

Published: 01 Jan 2025, Last Modified: 20 Sept 2025Neural Comput. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Low-resource fake news detection aims at discerning between true and false claims from low-resource languages with scarce benchmark datasets. In this resource-constrained scenario, fake news data collected from online hoax reporting system is inherently skewed because human fact checkers mainly sample claims that are more likely to be fake or false. Instead of training end-to-end classifier on the extremely imbalanced dataset, our study investigates a simple framework based on contrastive learning and stacking-based ensemble learning as an alternate fake news classification pipeline for Indonesian language. Our empirical result shows that by combining contrastive-based embedding model—Contrast-BERT and ensemble of multilayer perceptrons (MLPs) in inference stage, we improve the precision score in fake news classification up to 26.64%, while maintaining accuracy and recall scores of above 75%, given extreme class imbalance ratio 1:24. Contrast-BERT is also superior to its counterparts in unsupervised topic clustering and evidence retrieval by nearly twofold. Furthermore, we observe that contrastive-based model follows a similar performance trend in Indonesian clickbait benchmark dataset. Contrast-BERT is more accurate and precise at predicting samples than end-to-end BERT classifier by up to 47%, given training subset with extreme imbalance ratio \(\ge\) 1:19.
Loading