The Ghost Annotator: a Framework to Explore Human Label Variation in Content Moderation through Conformal Prediction

ACL ARR 2026 January Submission7407 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bias detection, Conformal prediction, Human-model alignment, Evaluation, Uncertainty estimation
Abstract: Current research predominantly focuses on model performance while overlooking uncertainty, particularly as LLMs increasingly generate annotated data. We introduce a framework combining conformal prediction with collaborative filtering to detect LLM biases. Using Non-Conformity Scores (NCS), we introduce the Ghost Prediction metric and Ghost Annotator concept to quantify and profile cases where models diverge from all human annotations. Applying Cosine similarity measures, we identify systematic biases along sociodemographic axes. Evaluating four LLMs across four content moderation datasets we revealed that smaller LLMs tend to be more confident yet less aligned with human annotations compared to larger models, and across all models, uncertainty increases as annotator disagreement rises, mirroring collective human behavior. Finally the Ghost Annotator framework unveils strong alignment between LLMs and annotators of a specific gender on particular datasets.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/fairness evaluation
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 7407
Loading