Keywords: fact checking, passage retrieval, misinformation detection, data-efficient training, less-resourced languages
TL;DR: SemViQA is a top Vietnamese fact-checking model with semantic retrieval and hierarchical classification, winning 1st at UIT Challenge with 78.97% and 80.82% accuracy. SemViQA Faster improves inference speed by over 7×
Abstract: Recent advances in LLMs have accelerated both information generation and misinformation, especially in low-resource languages like Vietnamese, motivating robust fact-checking systems. Existing methods struggle with semantic ambiguity, homonyms, and complex linguistic structures, often trading accuracy for efficiency. We introduce SemViQA, a novel Vietnamese fact-checking framework integrating Semantic-based Evidence Retrieval (SER) and Two-step Verdict Classification (TVC). Our approach balances precision and speed, achieving state-of-the-art results with 78.97% strict accuracy on ISE-DSC01 and 80.82% on ViWikiFC, securing 1st place in the UIT Data Science Challenge. Additionally, SemViQA Faster improves inference speed 7× while maintaining competitive accuracy. SemViQA sets a new benchmark for Vietnamese fact verification, advancing the fight against misinformation.
Submission Type: Emerging
Copyright Form: pdf
Submission Number: 320
Loading