Alignment of Chinese-English Medical Terminology in Small-Sample Scenarios: A Two-Stage Approach

Published: 01 Jan 2024, Last Modified: 13 May 2025BIBM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Cross-lingual terminology alignment is an important task in the field of medical terminology. Through cross-lingual alignment, medical terms from different languages can be accurately mapped to their corresponding concepts, thus establishing a unified and multilingual medical terminology fusion system. However, the scarcity of annotated parallel corpora for medical terminology in Chinese and English languages poses a challenge during the training process of Chinese-English terminology alignment models, resulting in decreased alignment accuracy. To address this issue, this paper proposes a hybrid approach that combines Large Language Models(LLMs) and pretrained Language Models(PLMs), leveraging the rich multilingual knowledge embedded in LLMs to obtain more comprehensive Chinese-English terminology information for assisting cross-lingual terminology alignment. However, LLMs require significant computational resources and memory, making them less suitable for large-scale alignment tasks. To tackle this problem, a confidence sampling strategy is introduced, delegating challenging samples to the LLMs for re-ranking, thereby reducing resource costs. Additionally, a prompt strategy tailored for terminology alignment tasks is proposed to enhance the accuracy of predictions made by the LLMs.We evaluate our method on mapping files of two medical open terminologies, and the experimental results demonstrate that our method outperforms baseline methods by 2% in terms of Hits@1 and Hits@10 metrics.Our code and data are available at https://github.com/Bruce-Y12/two-stage-ranking.
Loading