Supervised Contrastive Distillation for Enhanced Story Engagement Evaluation

ACL ARR 2025 May Submission5009 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) have demonstrated strong Large language models (LLMs) have demonstrated strong performance across a range of evaluation tasks---from sentiment analysis to factual verification---and are increasingly used to generate high‑quality annotations, such as assessing story quality. While LLMs have shown success in evaluating narratives, most existing metrics focus on objective properties rather than subjective aspects such as $\textit{engagement}$, which captures how much a reader is drawn into a story. We introduce a Supervised Contrastive Distillation (SCD) framework that distills fine-grained pairwise judgments, sourced from human annotations, and explanatory knowledge from powerful teacher models into more efficient student models for evaluating story engagement. Our approach leverages a contrastive loss that aligns predicted preferences with human judgments while penalizing confidence mismatches. We validate our framework on the HANNA dataset, a human-annotated benchmark derived from the WritingPrompts corpus, and demonstrate its effectiveness in producing accurate and computationally efficient comparative evaluations. Our distilled student model achieves 40\% higher accuracy than GPT-4 while reducing inference costs by approximately 80\%, offering a compact yet precise evaluator.
Paper Type: Short
Research Area: Generation
Research Area Keywords: automatic evaluation, efficient models
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 5009
Loading