Training-Free Voter-Adjudicator Framework for Schema-Guided Biomedical Annotation
Keywords: Multi-Agent Systems, Biomedical NLP, Human-in-the-Loop, Data Annotation, Epistemic Uncertainty, Information Extraction
TL;DR: A human-inspired ‘Voter-Adjudicator’ multi-agent framework that utilizes persona-driven diversity can improve biomedical text annotation and accurately escalate ambiguous instances to human experts.
Abstract: Scientific discovery is a longstanding aspiration in artificial intelligence, recently catalyzing the vibrant field of AI for Science. However, bridging AI capabilities with the vast human knowledge embedded in published literature remains a key challenge. For biomedical research where discoveries directly impact human health and quality of life, findings from millions of biomedical articles remain unexplored due to the lack of structured annotations. Automating the annotation of biomedical literature can be critical to developing AI models capable of extracting complex biomedical knowledge at scale. While LLMs can accelerate annotation, single-agent frameworks struggle with complex ontologies and guidelines that form the basis of annotation. They also treat annotation as an isolated computation, rather than a collaborative process. While recent studies employ Multi-Agent Debate (MAD) to overcome the bottlenecks from single-agent frameworks, these approaches are limited by the rigidity of reasoning models and the inability to recover from initial prediction errors.
In this ongoing work, we propose a schema-guided ‘Voter-Adjudicator’ framework that directly imitates authentic human annotation workflows. By injecting disciplinary personas into independent voter agents, we induce epistemic diversity to broaden feature extraction. A central adjudicator then synthesizes these divergent votes. Finally, by using the epistemic signals from our system like voter disagreement and adjudicator confidence, we aim to identify highly ambiguous instances for human escalation, while estimating the correlation between the instances with high uncertainty from our system and with the instances where human annotators disagreed. We hypothesize that divergence among labeler agents combined with low adjudicator confidence is not merely algorithmic failure, but a highly reliable proxy for true epistemic uncertainty. By accurately flagging these specific cases, the system can reduce human cognitive load and annotation time, optimizing the intervention of human experts.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 160
Loading