GIMM: A graph convolutional network-based paraphrase identification model to detecting duplicate questions in QA communities

Kunpeng Du, Xuan Zhang, Chen Gao, Rui Zhu, Qiong Nong, XianYu Yang, Chunlin Yin

Published: 2024, Last Modified: 21 May 2026Multim. Tools Appl. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Paraphrase Identification (PI) is an important task in Natural Language Processing (NLP), which aims to detect whether two sentences expressed in various forms are semantically consistent. It can be used to solve the problem of duplicate detection in QA Communities (eg: Quora and Stack Overflow). There have many studies that applied Convolutional Neural Networks to capture rich matching information between sentence pairs layer by layer. However, only a limited number of studies have explored the more flexible Graph Convolutional Networks (GCNs) for this task. GCN operates directly on the graph, and learns the representation of the node according to the neighborhood information of nodes. Thus, the interactive information between two sentences can be effectively integrated based on the local graph structure. In this paper, a Graph-based Interaction Matching model (GIMM) for PI is proposed. GIMM takes each word as a node, the word co-occurrence relations between sentence pairs, and the phrase relations within a single sentence as the relations between nodes to build the interaction graph. Then, the GCN are applied to learn the richer word representations based on the local structure of the graph. Finally, the node representations are aligned by the Attention mechanism to obtain the matching vector, and the results of PI are obtained by the Fully Connected Layer. We conduct experiments to compare the performance of GIMM with the current baselines on the Quora and Stack Overflow datasets. Experimental results demonstrate that the proposed model achieves excellent performance on both of these datasets.

External IDs:dblp:journals/mta/DuZGZNYY24