Reference-free  Quality Estimation of Entity Recognition and Linking over OCRed Historical Texts

Reference-free Quality Estimation of Entity Recognition and Linking over OCRed Historical Texts

ACL ARR 2025 July Submission178 Authors

24 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Named Entity Recognition (NER) and Named Entity Linking (NEL) are core tasks in entity extraction, yet their robustness is limited when applied to noisy documents, such as those generated by Optical Character Recognition (OCR) over historical documents. Although large language models (LLMs) have shown strong zero-shot and few-shot performance on NER and NEL tasks, prior work has largely focused on using LLMs as direct predictors. In this study, we investigate the feasibility of using LLMs as evaluators to estimate the quality of NER/NEL outputs in the absence of human-annotated ground truth. Focusing on OCRed texts where gold labels are scarce, we design and analyze supervised approaches to improve LLMs’ quality estimation. We design supervised based methods to improve quality judgments from LLMs and systematically compare their alignment with gold labels. Experiments on the HIPE-2020 benchmark across English, French, and German languages demonstrate that fine-tuned LLMs provide reliable estimates of output quality. Our findings suggest that LLM-based evaluation can support quality control and enable evaluation in noisy settings.

Paper Type: Long

Research Area: Information Extraction

Research Area Keywords: Information Extraction, OCRed, Historical Texts, Named Entity Recognition, Named Entity Linking, Large Language Models, Regression

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings

Languages Studied: French, Germany, English

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: 3, 4

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: 8

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: 8

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B5 Documentation Of Artifacts: Yes

B5 Elaboration: 4

B6 Statistics For Data: Yes

B6 Elaboration: 4

C Computational Experiments: Yes

C1 Model Size And Budget: N/A

C2 Experimental Setup And Hyperparameters: N/A

C3 Descriptive Statistics: N/A

C4 Parameters For Packages: Yes

C4 Elaboration: 4

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: N/A

Author Submission Checklist: yes

Submission Number: 178

Loading