EmoPair: A New Paradigm for Measuring Emotional Affect

Published: 03 Jun 2026, Last Modified: 10 Jun 2026AI4GOOD Workshop 2026 RegularEveryoneRevisionsBibTeXCC BY 4.0
Keywords: affective computing, pairwise comparison, AltTest, LLM-as-a-judge, computational social science
TL;DR: Replacing scalar emotion ratings with validated LLM pairwise comparisons produces more reliable annotations and better-performing models for measuring emotional Arousal and Dominance in text without reducing performance on Valence.
Abstract: The continuous Valence-Arousal-Dominance (VAD) framework maps emotions into a high-resolution, three-dimensional space, but expanding these datasets via manual annotation is costly and prone to subjective inconsistencies. To address this, we introduce EmoPair, which updates the EmoBank corpus by replacing direct scalar ratings with a scalable, LLM-driven pairwise comparison approach. This shift substantially improved manual inter-rater reliability, increasing Krippendorff's Alpha from 0.595 to 0.896 for Arousal and 0.570 to 0.865 for Dominance. Using Alternative Annotator Test (AltTest) validation to compare automated LLM annotations to a manual sample as well as Concept-Guided Chain-of-Thought (CGCoT) prompting, we generated reliable automated pairwise judgments, translating them into continuous scalar ratings via the Bradley-Terry model. Benchmarking with fine-tuned RoBERTa-large models (PERT and reward-based) showed EmoPair-trained models significantly outperformed EmoBank baselines on Arousal (84-85\% vs. 73-74\% accuracy) and Dominance (77\% vs. 61-65\%). These results demonstrate that pairwise comparisons provide superior, behaviorally aligned supervision data for emotional affect.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 99
Loading