BnPC: A Corpus for Paraphrase Detection in BanglaDownload PDF

Anonymous

08 Mar 2022 (modified: 05 May 2023)NAACL 2022 Conference Blind SubmissionReaders: Everyone
Paper Link: https://openreview.net/forum?id=4xNbfr01sn2
Paper Type: Short paper (up to four pages of content + unlimited references and appendices)
Abstract: In this paper, we present the first benchmark dataset for paraphrase detection in Bangla language. Despite being the sixth most spoken language in the world, paraphrase identification in the Bangla language is barely explored. Our dataset contains 8,787 human-annotated sentence pairs collected from a total of 23 newspaper outlets' headlines on four categories. We explore different linguistic features and pre-trained language models to benchmark the dataset. We perform a human evaluation experiment to obtain a better understanding of the task's constraints, which reveals intriguing insights. We make our dataset and code publicly available.
0 Replies

Loading