Semantic Ontology for Paraphrase Classification

ACL ARR 2024 June Submission1285 Authors

14 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Paraphrase classification is a useful NLP task used to identity texts with the same meaning. However, automated paraphrase classification is difficult to apply in practice due to the subjectivity involved in determining if two sentences are similar enough to considered paraphrases. We propose an ontology called Semantic Paraphrase Types (SPT) that describes a set of possible semantic relationships between two texts, covering two types of paraphrases and three types of non-paraphrases. Based on this ontology, we created a new set of labels on top of the commonly-used MRPC dataset, creating a new classification benchmark task called SPT Classification, including explanations for a subset of the dataset. We hope that our contributions will improve the usefulness of automatic paraphrase classification and generation methods for various real-world NLP applications. We will release the dataset and associated models and code for the baselines when the paper is accepted.
Paper Type: Long
Research Area: Semantics: Lexical and Sentence-Level
Research Area Keywords: paraphrase, semantics, nlp
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 1285
Loading