Does Summary Evaluation Survive Translation to Other Languages?Download PDF

Anonymous

08 Mar 2022 (modified: 05 May 2023)NAACL 2022 Conference Blind SubmissionReaders: Everyone
Paper Link: https://openreview.net/forum?id=ayx8alCh-Vz
Paper Type: Long paper (up to eight pages of content + unlimited references and appendices)
Abstract: The creation of a quality summarization dataset is an expensive, time-consuming effort, requiring the production and evaluation of summaries by both trained humans and machines. The returns to such an effort would increase significantly if the dataset could be used in additional languages without repeating human annotations. To investigate how much we can trust machine translation of summarization datasets, we translate the English SummEval dataset to seven languages and compare performances across automatic evaluation measures. We explore equivalence testing as the appropriate statistical paradigm for evaluating correlations between human and automated scoring of summaries. We also consider the effect of translation on the relative performance between measures. We find some potential for dataset reuse in languages similar to the source and along particular dimensions of summary quality. Our code and data can be found at https://github.com/PrimerAI/primer-research/.
Dataset: zip
Copyright Consent Signature (type Name Or NA If Not Transferrable): Spencer Braun
Copyright Consent Name And Address: Primer Technologies Inc., 244 Jackson Street #200, San Francisco, CA 94111
0 Replies

Loading