Another Look At Paraphrase IdentificationDownload PDF

Anonymous

16 Dec 2022 (modified: 05 May 2023)ACL ARR 2022 December Blind SubmissionReaders: Everyone
Abstract: In this paper, we take an updated look at the paraphrase identification task. We analyze commonly used English-language datasets such as MRPC, PAWS, and QQP. We study usage levels of these datasets, showing that dataset usage is heavily skewed towards MRPC. We also study and compare qualitative and quantitative characteristics of the datasets. We investigate the generalization performance of modern models trained on these datasets, showing that models do not generalize well across datasets. Lastly, we demonstrate methods to improve the generalization performance of models, showing that improved label consistency and MNLI pre-training are useful.
Paper Type: long
Research Area: Machine Learning for NLP
0 Replies

Loading