Abstract: Evaluating automatic paraphrase production systems is a difficult task as it involves, among other things, assessing the semantic proximity between two sentences.
Usual measures are based on lexical distances, or at least on semantic embedding alignments.
The rise of large language models has provided tools to model relationships within a text thanks to the attention mechanism.
In this article, we introduce ParaPLUIE, a new measure based on a log likelihood ratio from a LLM, to assess the quality of a potential paraphrase.
This measure is compared with usual measures on three datasets of manually labeled paraphrases and non-paraphrases pairs.
Two datasets, posterior to this study, are known for their quality or difficulty on this task.
The third one, build for this occasion, is composed of LLM outputs.
According to evaluations, the proposed measure is better for sorting pairs of sentences by semantic proximity.
In particular, it is much more independent to lexical distance and offer a easy classification threshold between paraphrases and non-paraphrases.
Paper Type: Long
Research Area: Semantics: Lexical and Sentence-Level
Research Area Keywords: paraphrasing
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: english
Submission Number: 3194
Loading