Benchmark of automatic metrics on Automatic Story Generation : do results depend on correlation coefficients ?
Abstract: Accurately estimating the correlation between metrics and human judgment for ASG models is crucial to evaluate the effectiveness of these metrics and enhance the models. To this end, Kendall, Pearson, and Spearman correlation coefficients offer various correlation measures and rankings of metrics. Our research paper proposes to assess the discrepancies in metric rankings among these correlation coefficients, emphasizing the significance of considering the peculiarities of each correlation coefficient before potential aggregation. By analyzing these differences, we can gain a better understanding of the strengths and limitations of each correlation coefficient and develop more reliable evaluation strategies for ASG models. Additionally, our findings highlight the need for further research on the effectiveness of different correlation measures in evaluating ASG models.
0 Replies
Loading