A Pre-trained Model for Chinese Medical Record Punctuation Restoration

Zhipeng Yu; Tongtao Ling; Fangqing Gu; Huangxu Sheng; Yi Liu

A Pre-trained Model for Chinese Medical Record Punctuation Restoration

Zhipeng Yu, Tongtao Ling, Fangqing Gu, Huangxu Sheng, Yi Liu

Published: 01 Jan 2023, Last Modified: 13 Nov 2024PRCV (7) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the medical field, text after automatic speech recognition (ASR) is poorly readable due to a lack of punctuation. Even worse, it can lead to the patient misunderstanding the doctor’s orders. Therefore, punctuation after ASR to enhance readability is an indispensable step. Most recent work has been fine-tuning downstream tasks on pre-trained models, but the models lack knowledge of relevant domains. Furthermore, most of the research is based on the English language, and there is less work in Chinese and even less in the medical field. Based on this, we thought of adding Chinese medical data to the model pre-training stage and adding the task of punctuation restoration in this work. From this, we proposed the Punctuation Restoration Pre-training Mask Language Model (PRMLM) task in the pre-training stage and used contrastive learning at this stage to enhance the model effect. Then, we propose a Punctuation Prior Knowledge Fine-tuning (PKF) method to play the role of contrast learning better when fine-tuning the downstream punctuation restoration task. In our medical field dataset, we performed a series of comparisons with existing algorithms to verify the proposed algorithm’s effectiveness.

Loading