Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
A New Dataset for Fine-Grained Citation Field Extraction
Sam Anzaroot, Andrew McCallum
May 10, 2013 (modified: May 10, 2013)ICML 2013 PeerReview submissionreaders: everyone
Abstract:Citation field extraction entails segmenting a citation string into its constituent parts, such as title, authors, publisher and year. Despite the importance of this task, there is a lack of well-annotated citation data. This paper presents a new labeled dataset for citation extraction that, in comparison to the previous standard dataset, exceeds four-times more data, sup- plies detailed nested labels rather than coarse-grained flat labels, and is derived from four different academic fields rather than one. We describe our new dataset in detail, and provide baseline experimental results from a state-of-the-art extraction method.
Enter your feedback below and we'll get back to you as soon as possible.