Abstract: In this paper, we take an updated look at the paraphrase identification task. We analyze commonly used English-language datasets such as MRPC, PAWS, and QQP. We study usage levels of these datasets, showing that dataset usage is heavily skewed towards MRPC. We also study and compare qualitative and quantitative characteristics of the datasets. We investigate the generalization performance of modern models trained on these datasets, showing that models do not generalize well across datasets. Lastly, we demonstrate methods to improve the generalization performance of models, showing that improved label consistency and MNLI pre-training are useful.
Paper Type: long
Research Area: Machine Learning for NLP
0 Replies
Loading