IRIS: Rapid Curation Framework for Iterative Improvement of Noisy Named Entity Annotations

Published: 01 Jan 2025, Last Modified: 09 Sept 2025NLDB (2) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We propose IRIS (IteRative Improvement dS-ner), a rapid curation framework for the iterative improvement of noisy named entity annotations. The framework aims to provide an efficient and rapid method for curating noisy entity annotations initially made by distantly supervised named entity recognition (DS-NER). Unlike many existing entity annotation tools, which focus primarily on annotation from scratch, IRIS is designed to provide more efficient methods for improving entity annotations initially generated by DS-NER. This is enabled by robust annotation search capabilities and the automatic annotation capabilities suggested by the NER model, which is initialized with the DS-NER model and later incrementally updated during annotation. We also adopt Active Learning (AL), which allows curators to work with documents suggested by the system, rather than manually selecting them. This work also analyzes the trade-offs among different strategies for iteratively updating the NER model, such as selecting training samples and the strategies for fine-tuning the new NER model based on the previous model or training it from scratch.
Loading