Abstract: Automatic media bias classification studies typically focus on isolated sentences, presenting challenges when applied to news articles. Article-level media bias classification offers a more practical and holistic approach. However, research in this area remains under-explored, partly due to the lack of datasets. Therefore, in this paper, we first release a reconstructed version of an existing dataset, consisting of full article texts and metadata. Second, we propose HT-MAGPIE, a hierarchical transformer for article-level media bias classification, leveraging MAGPIE---a large-scale model pre-trained on bias-related tasks---to produce bias-aware representations. We demonstrate that HT-MAGPIE outperforms all baselines by at least 0.13% and surpasses fine-tuned BERT by 5.02% in F1 score. We also explore the correlation between outlet-level and article-level bias by comparing model performance with and without outlet metadata. Our findings indicate that including outlet metadata as an additional feature improves F1 scores on fine-tuned BERT by 4.32% and BigBird by 2.62%
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: rumor/misinformation detection, NLP datasets
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 1593
Loading