TL;DR: New method for assessing the quaility of similarity evaluators and showing potential of Transformer-based language models in replacing BLEU and ROUGE.
Abstract: We review three limitations of BLEU and ROUGE – the most popular metrics
used to assess reference summaries against hypothesis summaries, come up with
criteria for what a good metric should behave like and propose concrete ways to
assess the performance of a metric in detail and show the potential of Transformers-based Language Models to assess reference summaries against hypothesis summaries.
1 Reply
Loading