Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition

ACL ARR 2024 April Submission168 Authors

14 Apr 2024 (modified: 11 Jun 2024)ACL ARR 2024 April SubmissionEveryone, Ethics Reviewers, Ethics ChairsRevisionsBibTeXCC BY 4.0
Abstract: Open-domain real-world entity recognition is essential yet challenging, involving identifying various entities in diverse environments. The lack of a suitable evaluation dataset has been a major obstacle in this field due to the vast number of entities and the extensive human effort required for data curation. We introduce Entity6K, a comprehensive dataset for real-world entity recognition, featuring 5,700 entities across 26 categories, each supported by 5 human-verified images with annotations. Entity6K offers a diverse range of entity names and categorizations, addressing a gap in existing datasets. We conducted benchmarks with existing models on tasks like image captioning, object detection, zero-shot classification, and dense captioning to demonstrate Entity6K's effectiveness in evaluating models' entity recognition capabilities. We believe Entity6K will be a valuable resource for advancing accurate entity recognition in open-domain settings.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: Open-world entity recognition
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 168
Loading