Are We Evaluating Paraphrase Generation Accurately?

Anonymous

Are We Evaluating Paraphrase Generation Accurately?

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: Paraphrase is a restatement of a text that conveys the same meaning using different expressions. The evaluation of paraphrase generation (PG) is a complex task and currently lacks a complete picture of the criteria and metrics. In this paper, we survey the automatic evaluation metrics and human evaluation criteria of PG evaluation. Base on the survey result, we propose a reference-free automatic toolkit and list clear human evaluation criteria. Moreover, we notice the paraphrases selection in downstream tasks and propose a simple but effective evaluation Filter model. It can fusion multi automatic metrics to fit the human evaluation without any references.

0 Replies

Loading