Caveats in Generating Medical Imaging Labels from Radiology Reports with Natural Language Processing
Keywords: medical imaging, radiology reports, machine learning, NLP
TL;DR: In medical imaging, image and report labels differ due to existence of clinically non-actionable findings
Abstract: Acquiring high-quality annotations in medical imaging is usually a costly process. Automatic label extraction with natural language processing (NLP) has emerged as a promising workaround to bypass the need of expert annotation. Despite the convenience, the limitation of such an approximation has not been carefully examined and is not well understood. With a challenging set of 1,000 chest X-ray studies and their corresponding radiology reports, we show that there exists a surprisingly large discrepancy between what radiologists visually perceive and what they clinically report. Furthermore, with inherently flawed report as ground truth, the state-of-the-art medical NLP fails to produce high-fidelity labels.
Code Of Conduct: I have read and accept the code of conduct.