Toward Preference-Aware Story Evaluation via Ranking, Rating and Reasoning

Anonymous

Toward Preference-Aware Story Evaluation via Ranking, Rating and Reasoning

Anonymous

17 Dec 2021 (modified: 05 May 2023)ACL ARR 2021 December Blind SubmissionReaders: Everyone

Abstract: Existing automatic story evaluation methods place a premium on story coherence, deviating from human preference. We go beyond such restrictions by presenting a more challenging task of \textbf{preference-aware story evaluation}. Given either a machine-generated or a human-written story, the task requires the machine to output a preference score that corresponds to human preference, along with specific ratings and comments for various aspects (e.g., opening, character-shaping). To support this novel task, we introduce a new dataset, namely \textbf{StoR3}, comprising (i) 100k ranked story pairs; and (ii) a set of 46k ratings and comments on various aspects of the story. To move towards preference-aware evaluation, we propose a model using the \textit{upvote count} as the criterion. The experiments show that the scores obtained by our model have a high correlation to human preference. Additionally, we discovered that the combination of aspect ratings and comments improves performance. Our dataset and benchmarks are publicly available to advance the research of story evaluation tasks.

Paper Type: long

Consent To Share Data: yes

0 Replies

Loading