Keywords: Oncology extraction, pathology report informatics
TL;DR: OncoNLP and OncoNLP-Assist propose a new approach for interacting with medical records, leveraging the powerful capabilities of BERT and generative models.
Abstract: Information extraction from clinical text is needed to comprehend patient conditions and determine anticancer treatment.
Existing NLP systems, such as Amazon Comprehend Medical, Clamp, and DeepPhe, are proprietary and suboptimal due to the shift in the textual distribution and expression of cancer phenotypes.
We introduce OncoNLP, a natural language processing toolkit that comprises deep neural network BERT models designed to extract cancer phenotype and related biomedical information.
Currently, the toolkit contains two primary components: a Biomedical BERT(BiomedBERT) model to extract general medical information and a Cancer Bert(CaBERT) model to identify the primary tumor site and histology.
We evaluate the performance of BiomedBERT on Informatics for Integrating Biology \& the Bedside (i2b2) dataset against Amazon Comprehend Medical and Clamp.
BiomedBERT outperforms all other methods with an exact matching F1-score of 88.5\%.
Next, we evaluate CaBERT on a Moffitt dataset with over 2000 clinical notes against DeepPhe, which shows a 50\% improvement in both tumor site and histology tasks.
Lastly, we introduce a knowledge prompt with OncoNLP and engineer it for large language models.
We name it OncoNLP-Assist, a chatbot system powered by OncoNLP and Llama2 that could extract information from electrical health records and interact with physicians.
Track: 7. Digital radiology and pathology
Registration Id: KKNBX7SVWRJ
Submission Number: 437
Loading