FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured informationDownload PDF

Published: 29 Jul 2021, Last Modified: 24 May 2023NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
Keywords: fact extraction and verification, structured information, unstructured and structured information, fact checking, natural language processing, information retrieval
TL;DR: The paper introduces a novel dataset for fact-checking claims using both unstructured and structured information from Wikipedia.
Abstract: Fact verification has attracted a lot of attention in the machine learning and natural language processing communities, as it is one of the key methods for detecting misinformation. Existing large-scale benchmarks for this task have focused mostly on textual sources, i.e. unstructured information, and thus ignored the wealth of information available in structured formats, such as tables. In this paper we introduce a novel dataset and benchmark, Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS), which consists of 87,026 verified claims. Each claim is annotated with evidence in the form of sentences and/or cells from tables in Wikipedia, as well as a label indicating whether this evidence supports, refutes, or does not provide enough information to reach a verdict. Furthermore, we detail our efforts to track and minimize the biases present in the dataset and could be exploited by models, e.g. being able to predict the label without using evidence. Finally, we develop a baseline for verifying claims against text and tables which predicts both the correct evidence and verdict for 18% of the claims.
Supplementary Material: zip
URL: Dataset (train + dev + unlabeled test set) and Wikipedia data can be downloaded here: https://fever.ai/dataset/feverous.html.
Contribution Process Agreement: Yes
Dataset Url: https://fever.ai/dataset/feverous.html
License: These data annotations incorporate material from Wikipedia, which is licensed pursuant to the Wikipedia Copyright Policy. These annotations are made available under the license terms described on the applicable Wikipedia article pages, or, where Wikipedia license terms are unavailable, under the Creative Commons Attribution-ShareAlike License (version 3.0), available at http://creativecommons.org/licenses/by-sa/3.0/ (collectively, the "License Terms"). You may not use these files except in compliance with the applicable License Terms.
Author Statement: Yes
9 Replies