Neural Machine Translation for Chinese Patent Medicine Instructions

Published: 01 Jan 2023, Last Modified: 28 Sept 2024JCRAI 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Abstract. Insufficient research has been conducted on the validity of datasets pertaining to the translation of Chinese Patent Medicine Instructions into English. Upon analyzing the Chinese and English texts generated by prominent translation engines, we observe that the readability of translation is a sore point and the English translation standards lack consistency. There exists a restricted range of internet search platforms that are specifically designed for the purpose of Chinese Patent Medicine (CPM). The focus of these platforms centers on the domain of specialized terminology related to Chinese herbal medicine. To address these problems, we initially develop a Chinese Patent Medicine Instruction Dataset (CPMID) for Chinese-English translation. This dataset comprises 11,695 Chinese-English entries to be meticulously annotated and validated. We benchmark the task by training and testing multiple baselines including traditional models Seq2Seq+Attention (LSTM) and Transformer, pre-trained and released translation models SMaLL-100, NLLB-200, mBART-50, and ChatGPT. The dataset demonstrates the accuracy and effectiveness with improvement of 42.5 BLEU, surpassing prior state-of-the-art by over 54.7%. The primary objective of utilizing this dataset in future R&D is to provide a reliable retrieval system for foreign users of Chinese Patent Medicine (CPM). We believe that the implementation of CPMID has the potential to facilitate the modernization of Traditional Chinese Medicine (TCM) and significantly contribute to the field of Modern Medicine (MM).
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview