Extracting Paraphrases of Technical Terms from Noisy Parallel Software CorporaDownload PDFOpen Website

2009 (modified: 12 Nov 2022)ACL/IJCNLP (Short Papers) 2009Readers: Everyone
Abstract: In this paper, we study the problem of extracting technical paraphrases from a parallel software corpus, namely, a collection of duplicate bug reports. Paraphrase acquisition is a fundamental task in the emerging area of text mining for software engineering. Existing paraphrase extraction methods are not entirely suitable here due to the noisy nature of bug reports. We propose a number of techniques to address the noisy data problem. The empirical evaluation shows that our method significantly improves an existing method by up to 58%.
0 Replies

Loading