Abstract: Media framing is the study of strategically selecting and presenting specific aspects of political issues to shape public opinion. Despite its relevance to almost all societies around the world, research has been limited due to the lack of available datasets and other resources. This study explores the possibility of dataset creation through crowdsourcing, utilizing non-expert annotators to develop training corpora. We first extend framing analysis beyond English news to a multilingual context (12 typologically diverse languages) through automatic translation. We additionally present a novel benchmark in Bengali and Portuguese on the immigration and same-sex marriage domains. Last, we show that a system trained on our crowd-sourced dataset, combined with other existing ones, leads to an accuracy of 73.22%, which is a 5.32% increase from the baseline. Additionally, we find that models built with fewer data can significantly outperform systems that are trained on far more data in a multilingual evaluation setting.
Paper Type: long
Research Area: Computational Social Science and Cultural Analytics
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data resources
Languages Studied: Bengali, English, German, Greek, Italian, Turkish, Nepali, Hindi, Portuguese, Telugu, Russian, Swahili, and Mandarin Chinese
0 Replies
Loading