PrefScore: Pairwise Preference Learning for Reference-free Single-document Summarization Quality Assessment
Abstract: Evaluating machine-generated summaries without a human-written reference summary has been a need for a long time. Inspired by preference labeling in existing works of summarization evaluation, we propose to judge summary quality by learning the preference rank of summaries using the Bradley-Terry power ranking model from generated inferior summaries of a base summary. Despite the simplicity of our method, extensive experiments on several datasets show that our weakly supervised scheme can produce scores highly correlate with human ratings.
Paper Type: short
0 Replies
Loading