ViFactCheck: Empowering Vietnamese Fact-Checking across Multiple Domains with a Comprehensive Benchmark Dataset and MethodsDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: With the rapid development of online information platforms, barriers to the dissemination of information, particularly in media, are diminishing. However, this context has led to various issues, including the proliferation of fake news. Thus, a high-quality datasets and robust solutions for fact-checking, especially for low-resource languages, are essential. This study presents the \textbf{ViFactCheck} dataset, the first publicy benchmark {\bf Vi}etnamese {\bf Fact}-{\bf Check}ing dataset for multiple online news domain. Comprising 7,232 human-annotated statements from reputable Vietnamese online news sources, the dataset covers 12 topics and follows a strict data-constructing process. We also evaluate state-of-the-art monolingual and multilingual pre-trained language models on the ViFactCheck dataset. On the ViFactCheck dataset, the XLM-R$_{large}$ model outperforms robust baseline models such as mBERT, XLM-R$_{base}$, PhoBERT$_{large}$, PhoBERT$_{base}$, ViBERT achieving a notable macro F1 score of 78.40\%. These findings demonstrate the dataset's potential for practical applications.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: Vietnamese
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview