Beyond Pivot for Extracting Chinese Paraphrases

Yu Zhang, Le Qi, Linjie Wang, Linlin Yu, Ting Liu

2018 (modified: 13 Nov 2021)CCIR 2018Readers: Everyone

Abstract: Paraphrasing is a critical issue in many Natural Language Processing (NLP) applications. The traditional Pivot-based methods of extracting paraphrases require a large-scale bilingual parallel corpus. The quality of the extracted paraphrases is affected by the quality of bilingual parallel corpora and word alignment. In this paper, we propose a method for Chinese paraphrases extraction. An online translation system is used to obtain the candidate paraphrases of a word. A deep neural network model combined with cosine similarity is exploited to filter the candidate results through computing the similarity of word vectors between a word and its candidate paraphrase. Experiments are conducted in two ways: (1) The random sampling is employed to manually verify the correctness of the paraphrases results. The effect has been significantly improved; (2) We design two Question Answering (QA) systems based on the NLPCC2016 Document Based Question Answering (DBQA) corpus. One uses the BM25 model to retrieve the candidate answer sentences, and another uses the Convolution Neural Network (CNN) model. Extracted paraphrases are quite effective in question reformulation, enhancing the MRR from 56.33% to 60.21% (BM25) and from 63.82% to 66.60% (CNN) with the questions of NLPCC 2016 DBQA corpus.

0 Replies