Abstract: In recent years, language models have been widely applied to various natural language processing tasks. The pre-trained language model BERT(Bidirectional Encoder Representation from Transformers), proposed by Devlin et al. [14], has achieved a good performance in different tasks, and it only requires simple fine-tuning to adapt to different tasks. In this paper, the BERT model is used as a baseline, and various information enhancement strategies are adopted to enhance its representation. For any Obstetric EMR, it can be divided into textual information and numerical information. According to the importance of the diagnosis, the textual information in EMRs can be further divided into indiscriminate information, basic information, and key information. Therefore, it is essential to enhance the model representation with the hierarchical information to improve the performance of diagnosis. The model BERT, proposed by Devlin et al. [14], which takes all the sequence information as input, takes no account of the differentiated processing of hierarchical information, and ignores the importance of numerical information. In this paper, indiscriminate information processing and key diagnostic text information fusion are utilized to process the hierarchical information, moreover the textual feature enhancement and numerical feature fusion are performed through the enhanced-layer. A novel Hierarchical Information Enhanced BERT (HIE-BERT) model is proposed for the diagnosis assistant. The evaluation results using the AAPD dataset and the dataset with more than 2,0000 EMRs show that our model has improved performance compared with the basic BERT model. The main contributions of this paper are summarized as follows:
Loading