MDACE: MIMIC Documents Annotated with Code Evidence

Anonymous

MDACE: MIMIC Documents Annotated with Code Evidence

Anonymous

16 Oct 2022 (modified: 05 May 2023)ACL ARR 2022 October Blind SubmissionReaders: Everyone

Keywords: CAC, medical code evidence, dataset, CNN, model interpretability

Abstract: The accuracy of Computer-Assisted Coding (CAC) systems has improved significantly in recent years, thanks to advances in machine learning technologies. Yet simply predicting a set of final codes for a patient encounter is insufficient as CAC systems are required to provide supporting textual evidence to justify the billing codes. A model able to produce accurate and reliable supporting evidence for each code would be a tremendous benefit. However, a human annotated code evidence corpus is extremely difficult to create because it requires specialized knowledge. In this paper, we introduce MDACE, the first publicly available code evidence dataset, which is built on a subset of the MIMIC-III clinical records. The dataset -- annotated by professional medical coders -- consists of 302 Inpatient charts with 3,934 evidence spans and 52 Profee charts with 5,563 evidence spans. We implemented several evidence extraction methods based on the EffectiveCAN model (Liu et al., 2021) to establish baseline performance on this dataset. MDACE can be used to evaluate code evidence extraction methods for CAC systems, as well as the accuracy and interpretability of deep learning models for multi-label classification. We believe that the release of MDACE will greatly improve the understanding and application of deep learning technologies for medical coding and document classification.

Paper Type: long

Research Area: Resources and Evaluation

0 Replies

Loading