Abstract: Wikipedia is a critical resource for modern NLP, serving as a rich source of current and citation-backed information on a wide variety of subjects. The reliability of Wikipedia—its groundedness in its cited sources—is vital to this purpose. This work provides a quantitative analysis of the extent to which Wikipedia *is* so grounded and of how readily grounding evidence may be retrieved. To this end, we introduce PeopleProfiles—a large-scale, multi-level dataset of claim support annotations on Wikipedia articles of notable people—and show both that a surprising proportion of Wikipedia claims (20-27%) are in fact *unsupported* by publicly accessible sources and, further, that recovery of complex grounding evidence for claims that *are* supported remains a challenge for standard retrieval methods.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: NLP datasets,evaluation,automatic creation and evaluation of language resources
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: English
Submission Number: 4596
Loading