A New Dataset for Fine-Grained Citation Field ExtractionDownload PDF

23 Apr 2024 (modified: 10 May 2013)ICML 2013 PeerReview submissionReaders: Everyone
Decision: oral
Abstract: Citation field extraction entails segmenting a citation string into its constituent parts, such as title, authors, publisher and year. Despite the importance of this task, there is a lack of well-annotated citation data. This paper presents a new labeled dataset for citation extraction that, in comparison to the previous standard dataset, exceeds four-times more data, sup- plies detailed nested labels rather than coarse-grained flat labels, and is derived from four different academic fields rather than one. We describe our new dataset in detail, and provide baseline experimental results from a state-of-the-art extraction method.
0 Replies

Loading