Uncertainty Quantification for Named Entity Recognition via Conformal Prediction
TL;DR: We present a conformal prediction framework for NER that produces prediction sets with guaranteed coverage, serving as confidence intervals for sequence labels.
Abstract: Named Entity Recognition (NER) is a foundational component in many language tasks, such as knowledge graph construction, information extraction, and question answering. However, existing NER models typically output a single predicted label sequence without any quantification of uncertainty, leaving downstream applications vulnerable to cascading errors. We introduce a conformal prediction framework for NER that produces prediction sets over full label sequences with finite-sample coverage guarantees, serving an analogous role to confidence intervals in classical statistics. To improve efficiency, we propose three innovations: (i) hybrid probability-index nonconformity scores, (ii) conditional calibration across strata such as sentence length and language, and (iii) an adaptation of the RAPS procedure to sequence labeling. These techniques mitigate the problem of overly large prediction sets while maintaining valid coverage. Experiments on CoNLL++, CoNLL-Reduced, and WikiNEuRal benchmarks demonstrate that our methods consistently achieve the target confidence while producing efficient prediction sets across diverse base models. This work establishes a statistically principled approach to uncertainty-aware NER with direct benefits for downstream knowledge-driven NLP systems.
Submission Number: 1398
Loading