Study of automatic evaluation metrics applied to story generation in relation to human metrics

Vinciane Desbois, Clémence Millet

22 Mar 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: Automatic story generation is a complex branch of NLP whose evaluation techniques have been less studied than for summarization or data-to-text. In this analysis, we will focus on the relevance of the different existing automatic metrics, both traditional and more recent, to evaluate this type of task. With the help of a dataset annotated by human evaluators, we compare automatic metrics to human metrics, look for correlations between them and observe the performance of automatic metrics in predicting some human metrics. Our results mainly show a high similarity between all automatic metrics and their difficulty in predicting human metrics, even when combined.

0 Replies