DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related QueriesDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: This paper develops the first question answering dataset (DrugEHRQA) containing question-answer pairs from both structured tables and unstructured notes from a publicly available Electronic Health Record (EHR). EHRs contain patient records, stored in structured tables as well as unstructured clinical notes. The information in structured and unstructured EHR records is not strictly disjoint: information may be duplicated, contradictory, or provide additional context between these sources. This presents a rich opportunity to study question answering (QA) models that combine reasoning over both structured and unstructured data. Additionally, we propose a novel methodology that automatically generates a large QA dataset by retrieving answers from both structured and unstructured EHR records. The automatically-generated dataset has medication-related queries, containing over 70,000 question-answer pairs. Our dataset is validated for both individual modalities using state-of-the-art QA models. In order to address the problem arising from complex, nested queries, this is the first time Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers (RAT-SQL) has been used for EHR data. Finally, we introduce a rule-based method to obtain multi-modal answers, combining the answers from the different modalities. Our goal is to provide a benchmark dataset for multi-modal QA systems, and to open up new avenues of research in improving question answering over EHR structured data by using context from unstructured clinical data.
0 Replies

Loading