CXR Fact Encoder: Combining Large Language Models with Medical Knowledge for Enhanced Radiological Text Representation
Abstract: Recent advancements in representation learning, although promising, often confront challenges in specialized domains like medicine. In particular, the acquisition of expert annotations for medical texts and images is notably burdensome due to the limited availability and time constraints of medical professionals. Recognizing this, Large Language Models (LLMs) offer a promising avenue to automatically extract annotations from radiology reports at scale. In this work, we exploit the potential of pairing LLMs with domain-specific knowledge, thus reducing the dependency on time-intensive human expert annotations for improved medical text representation. Specifically, we introduce a two-stage system for the extraction and encoding of facts from radiology reports using LLMs such as ChatGPT and T5, in tandem with specialized medical knowledge sources. As a cornerstone of this system, we present CXR Fact Encoder—a BERT-based model fine-tuned for the enhanced representation of chest X-ray radiology reports. Additionally, we illustrate the applicability of our method by introducing CXR Fact Encoder Score, a novel evaluation metric crafted specifically for radiology text generation, drawing from all the elements of our two-stage system. Our evaluations show the proposed system outperforms multiple baseline methods in tasks like sentence ranking, natural language inference, and label extraction from radiology reports. We make our model weights, data, and code publicly available.
Paper Type: long
Research Area: Information Extraction
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading