Abstract: Named Entity Recognition (NER) seeks to extract entity mentions from texts with predefined categories such as Person, Location. General domain NER datasets like CoNLL-2003 mostly annotate Location coarse-grained entities manner (e.g., a country or a city). However, many applications require to identify fine-grained locations from texts and map them precisely to geographic sites (e.g., a crossroad or a store). Therefore, we propose a new NER dataset HarveyNER with fine-grained locations annotated in tweets. This dataset presents unique challenges and characterizes many complex and long location mentions in informal descriptions. Considering Curriculum Learning can help a system better learn the hard samples, we adopt it and first design two heuristic curricula based on the characteristic difficulties of HarveyNER, and then propose a novel curriculum that takes the commonness of sample difficulty into consideration. Our curricula are simple yet effective and experimental results show that our methods can improve both the hard case and overall performance in HarveyNER over strong baselines without extra cost.
Paper Type: long
0 Replies
Loading