Refining and Reusing Annotation Guidelines for LLM Annotation

Refining and Reusing Annotation Guidelines for LLM Annotation

ACL ARR 2026 January Submission5933 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Annotation Guidelines, Annotation Moderation, Large Language Models, Biomedical Named Entity Recognition

Abstract: While Large Language Models (LLMs) demonstrates remarkable zero-shot annotation tasks, they often struggle with the specialized conventions of gold-standard benchmarks. We propose the systematic reuse and refinement of annotation guidelines as an alignment mechanism, introducing an iterative moderation framework that simulates the early phases of annotation projects. We evaluate three hypotheses: (1) the efficacy of guideline integration, (2) the advantage of reasoning-optimized models, and (3) the viability of moderation under minimal supervision. Testing across biomedical NER tasks (NCBI Disease, BC5CDR, BioRED) with three LLM families (GPT, Gemini, DeepSeek), our results empirically confirm all three hypotheses. While the iterative moderation framework shows a good potential in effectively refining guidelines, our analysis also reveals a significant room for improvement.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: automatic creation and evaluation of language resources

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data analysis

Languages Studied: English

Submission Number: 5933

Loading