Evaluating story generation through automated metrics: reanalyzing the HANNA dataset

Léo Houairi, Conrad THIOUNN

20 Mar 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: Automatic Story Generation (ASG) is a popular branch of Natural Language Processing (NLP). As any field, in order to improve its models, it requires reliable ways to measure the quality of its outputs. Since human evaluation is costly, the development of Automatic Evaluation Metrics (AEM), that should be correlated with human judgement, is a crucial area of research. In this paper, we use the HANNA dataset to benchmark the capabilities of different AEM, reproducing some experiments of the paper that introduced this dataset. Thus, our research and the structure of this paper are largely drawn from it. Our code is available on github: https://github.com/leohouairi/NLP-text-similarity. (The contributions of the authors to this article are equal, therefore the order of the authors was drawn with a Bernoulli law of probability 0.5. The study only engages the authors and does not engage ENSAE Paris and INSEE.)

0 Replies