PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts

Chang Liu, Daniel Dahlmeier, Hwee Tou Ng

2010 (modified: 10 Nov 2022)EMNLP 2010Readers: Everyone

Abstract: We present PEM, the first fully automatic metric to evaluate the quality of paraphrases, and consequently, that of paraphrase generation systems. Our metric is based on three criteria: adequacy, fluency, and lexical dissimilarity. The key component in our metric is a robust and shallow semantic similarity measure based on pivot language N-grams that allows us to approximate adequacy independently of lexical similarity. Human evaluation shows that PEM achieves high correlation with human judgments.

0 Replies