Distantly supervised end-to-end medical entity extraction from electronic health records with human-level quality

Alexander Nesterov; Dmitry Umerenkov

Distantly supervised end-to-end medical entity extraction from electronic health records with human-level quality

Alexander Nesterov, Dmitry Umerenkov

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: entity extraction, medical entity extraction, named entity recognition, named entity normalization, electronic health records, unsupervised learning, distant supervision.

Abstract: Medical entity extraction (EE) is a standard procedure used as a first stage inmedical texts processing. Usually Medical EE is a two-step process: named entityrecognition (NER) and named entity normalization (NEN). We propose a novelmethod of doing medical EE from electronic health records (EHR) as a single-step multi-label classification task by fine-tuning a transformer model pretrainedon a large EHR dataset. Our model is trained end-to-end in an distantly supervisedmanner using targets automatically extracted from medical knowledge base. Weshow that our model learns to generalize for entities that are present frequentlyenough, achieving human-level classification quality for most frequent entities.Our work demonstrates that medical entity extraction can be done end-to-endwithout human supervision and with human quality given the availability of alarge enough amount of unlabeled EHR and a medical knowledge base.

One-sentence Summary: We propose a method of medical entity extraction as a single-step multi-label classification task with distant supervision labels and achieve human-level extraction quality for most frequent entities.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=nA_UkZdg_R

5 Replies

Loading