Abstract: The patient portal is a secure platform that enables patients to conveniently access to their medical records and communicate with and seek support from their care teams for various healthcare issues. The care teams are receiving a rising number of patient portal messages (PPMs) generated by patients as the increase of patient engagement in patient portal, especially during the COVID-19 pandemic. It opens great opportunities to develop artificial intelligence (AI) solutions to supports the care teams and reduce their workload. Previous studies have shown the potential of machine learning and natural language processing (NLP) for classification of PPMs to automate message triage. Recent progress in neural network architectures (e.g., the Transformer) and the emergence of large-scale pre-trained language models are pushing the state-of-the-art for multiple NLP tasks including text classification. However, there has been very little research studies about the performance of pre-trained language models for the PPMs classification task. This study proposes to investigate the performance of multiple state-of-the-art NLP pre-trained language models based on the language style and the nature of content for the multi-category PPMs classification task. The results could contribute to developing future advanced and precise triage system of PPMs for clinical applications. We developed PPM classifiers leveraging five popular Bidirectional Encoder Representations from Transformers (BERT) language models including the generic BERT models (BERT and RoBERTa), domain-specific BERT models (BioBERT and Bio_ClinicalBERT), and source-specific BERT model (BERTweet), which were pre-trained with corpus in different domain and from different sources. The BERT-based models were fine-tuned and evaluated with an annotated corpus of 2239 annotated PPMs with four categories. Our results show that source-specific model BERTweet outperformed all other tested models, with a 0.78 accuracy score and a 0.75 F-1 score. Both domain-specific models BioBERT and Bio_ClinicalBERT achieved better results (F1 scores of 0.74 and 0.72), compared with the generic models BERT and RoBERTa (F-1 scores of 0.70 and 0.66), respectively. BERTweet gained the benefit from the large-scale English tweets pre-train corpus which has informal and colloquial language similar to PPMs. The domain specific models BioBERT and Bio_ClinicalBERT performed better than the generic models because they were pre-trained with large-scale biomedical and clinical corpora and can learn better representation of PPMs for better recognition of health concepts. The results show that the language models pre-trained by a corpus with similar language style and domain concepts are beneficial to the downstream PPM classification. It would be worth to explore the combination of multiple featured language models in order to further improve the model performance.
Loading