Topic Sentence Named Entity Recognition: A New Task with Its Dataset and BenchmarksDownload PDF


16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: In this paper, we focus on a new type of named entity recognition (NER) task called topic sentence NER. A topic sentence means a short and compact sentence that acts as a summary of a long document. For example, a title can be seen as a topic sentence of its article. Topic sentence NER aims to extract named entities in a topic sentence given the corresponding unlabeled document as a reference. This task represents real-world scenarios where full-document NER is too expensive and obtaining the entities only in topic sentences is enough for downstream tasks. To achieve this, we construct a large-scale human-annotated Topic Sentence NER dataset, named TSNER. The dataset contains 12,000 annotated sentences accompanied by their unlabeled document. Based on TSNER, we propose a family of representative and strong baseline models, which can utilize both single-sentence and document-level features. We will make the dataset public in the hope of advancing the research on the topic sentence NER task.
0 Replies
