Abstract: In this article, we perform a Benchmark of the different automatic metrics for the evaluation of the text summarization task. This had already been done in the SummEval article, we propose an analysis that completes it. We reuse the same dataset and consider correlations between automatic metrics and human metrics not only at the system-level but also at the level of the type of summary studied (SumUp-level correlation). Moreover, we will also compute the Spearman and Pearson correlations and we will include in our analysis 3 new metrics: the Depthscore, the BaryScore and the InfoLM. The idea is to better understand the relationship between automatic evaluation metrics and human metrics for the text summarization task and thus direct research into creating metrics that are more correlated with human metrics.
0 Replies
Loading