CMedMi: Text Similarity Detection of Chinese Medical Question Based on Mutual Information

Minfeng Lu, Qifei Zhang, Huilai Zhou

Published: 2025, Last Modified: 07 Jan 2026IEEE Access 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Chinese medical question similarity detection is an important task that uses natural language processing to assess whether two Chinese medical questions are similar semantically. It has broad applications in medical information retrieval and question-answering systems. However, due to the inherent complexity and ambiguity of the Chinese language, along with the specialized nature of the medical field, different questions may refer to the same disease using varying terminology. Furthermore, identical terms may carry different meanings across various contexts. To overcome this challenge, we propose a similarity detection method based on mutual information. The proposed approach employs mutual information techniques to extract richer textual information from the medical question corpus. We first obtain the text embedding vector through a Chinese pre-trained model and put it into a text feature encoder to get an encoded vector, then input both the embedding vector and the encoded vector into a similarity detection network and a mutual information maximization network. By simultaneously optimizing the objective functions of these two networks, we get a refined encoded vector to predict the similarity of questions. Experiments on the TCAI20 and cMedQQ datasets demonstrate that this method effectively detects medical question similarity, achieving significantly improved performance compared with many traditional methods. These results highlight the feasibility and effectiveness of the proposed approach.

External IDs:dblp:journals/access/LuZZ25