Identifying Duplicate Questions Leveraging Recurrent Neural Network

Published: 01 Jan 2022, Last Modified: 18 May 2025TCCE 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Community Question Answering (CQA) forums are the predominant platform where the users can respond to others’ questions and share acquainted insights. The influx of new questions with linguistic expression variability and ambiguity leads to a haphazard collection of overlapped and unique questions. Hence, the challenge of identifying the equivalent questions emerges so that the users can be redirected to proper references. In this paper, we propose recurrent neural network-based architecture employing word-embedding to assess whether a question-pair is duplicate or not. After a careful pre-processing step, we apply several pre-trained word-embedding models to represent questions semantically in a fixed dimensional real-valued vector. We then apply two different RNN architectures, namely Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM) to encode the underlying meaning of question-pairs. Finally, the introduced models predict whether the question-pair is duplicate or not. The experimental results on the benchmark dataset demonstrated that our best models yielded competitive results with an accuracy of 82 and 83% and contribute to the state of the art. In addition, our method is applicable to other textual similarity identification tasks.
Loading