TweetNERD - End to End Entity Linking Benchmark for TweetsDownload PDF

06 Jun 2022, 15:27 (modified: 16 Jan 2023, 10:29)NeurIPS 2022 Datasets and Benchmarks Readers: Everyone
Keywords: Twitter, Social Media, Named Entity Recognition, Named Entity Disambiguation, Entity Linking, Wikification, Dataset
TL;DR: TweetNERD is a dataset of 340K+ Tweets for benchmarking Named Entity Recognition and Disambiguation systems on English Tweets.
Abstract: Named Entity Recognition and Disambiguation (NERD) systems are foundational for information retrieval, question answering, event detection, and other natural language processing (NLP) applications. We introduce TweetNERD, a dataset of 340K+ Tweets across 2010-2021, for benchmarking NERD systems on Tweets. This is the largest and most temporally diverse open sourced dataset benchmark for NERD on Tweets and can be used to facilitate research in this area. We describe evaluation setup with TweetNERD for three NERD tasks: Named Entity Recognition (NER), Entity Linking with True Spans (EL), and End to End Entity Linking (End2End); and provide performance of existing publicly available methods on specific TweetNERD splits. TweetNERD is available at: under Creative Commons Attribution 4.0 International (CC BY 4.0) license. Check out more details at
Supplementary Material: pdf
Dataset Url: More Details at:
License: Creative Commons Attribution 4.0 (CC-BY-4.0)
Author Statement: Yes
Contribution Process Agreement: Yes
In Person Attendance: Yes
14 Replies