Annotated Gigaword

Courtney Napoles, Matthew R. Gormley, Benjamin Van Durme

2012 (modified: 16 Jul 2019)AKBC-WEKEX@NAACL-HLT 2012Readers: Everyone

Abstract: We have created layers of annotation on the English Gigaword v.5 corpus to render it useful as a standardized corpus for knowledge extraction and distributional semantics. Most existing large-scale work is based on inconsistent corpora which often have needed to be re-annotated by research teams independently, each time introducing biases that manifest as results that are only comparable at a high level. We provide to the community a public reference set based on current state-of-the-art syntactic analysis and coreference resolution, along with an interface for programmatic access. Our goal is to enable broader involvement in large-scale knowledge-acquisition efforts by researchers that otherwise may not have had the ability to produce such a resource on their own.

0 Replies