Reference-free Quality Estimation of Entity Recognition and Linking over OCRed Historical Texts

ACL ARR 2025 July Submission178 Authors

24 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Named Entity Recognition (NER) and Named Entity Linking (NEL) are core tasks in entity extraction, yet their robustness is limited when applied to noisy documents, such as those generated by Optical Character Recognition (OCR) over historical documents. Although large language models (LLMs) have shown strong zero-shot and few-shot performance on NER and NEL tasks, prior work has largely focused on using LLMs as direct predictors. In this study, we investigate the feasibility of using LLMs as evaluators to estimate the quality of NER/NEL outputs in the absence of human-annotated ground truth. Focusing on OCRed texts where gold labels are scarce, we design and analyze supervised approaches to improve LLMs’ quality estimation. We design supervised based methods to improve quality judgments from LLMs and systematically compare their alignment with gold labels. Experiments on the HIPE-2020 benchmark across English, French, and German languages demonstrate that fine-tuned LLMs provide reliable estimates of output quality. Our findings suggest that LLM-based evaluation can support quality control and enable evaluation in noisy settings.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: Information Extraction, OCRed, Historical Texts, Named Entity Recognition, Named Entity Linking, Large Language Models, Regression
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings
Languages Studied: French, Germany, English
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: 3, 4
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: 8
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: 8
B4 Data Contains Personally Identifying Info Or Offensive Content: N/A
B5 Documentation Of Artifacts: Yes
B5 Elaboration: 4
B6 Statistics For Data: Yes
B6 Elaboration: 4
C Computational Experiments: Yes
C1 Model Size And Budget: N/A
C2 Experimental Setup And Hyperparameters: N/A
C3 Descriptive Statistics: N/A
C4 Parameters For Packages: Yes
C4 Elaboration: 4
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: N/A
Author Submission Checklist: yes
Submission Number: 178
Loading