LRTA-BioMIC: Lightweight Region-Text Aligned BioMIC-BART for Chest X-ray Report Generation

ACL ARR 2025 February Submission5896 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The global shortage of radiologists is a major challenge. Radiology is vital for diagnosing and treating diseases, especially in the lungs and heart, using imaging like X-rays. To address this shortage and workload, we introduce $\textit{\textbf{L}ightweight \textbf{R}egion-\textbf{T}ext \textbf{A}ligned \textbf{BioMIC}-BART} (\textbf{LRTA-BioMIC})$, which generates Chest X-ray reports from X-ray images. $\textit{LRTA-BioMIC}$ is a computationally efficient, Domain Specific, Region Guided Text Aligned language model that integrates tagger information and X-ray embeddings from ViT through cross-attention at every layer of the BioMIC-BART Encoder to generate radiology reports (Findings and Impression). Our model achieves a notable improvement of $\textbf{9.71\%}$ in BLEU-4 and $\textbf{0.9\%}$ in ROUGE-L compared to the previous state-of-the-art, $\textit{COMG}$ and $\textit{KGVL-BART}$, on the $\textit{IU-Xray}$ dataset. $\textit{LRTA-BioMIC}$ also demonstrates competitive performance on the $\textit{MIMIC-CXR-JPG}$ dataset, with a $\textbf{1.60\%}$ increase in BLEU-4 and a slight $\textbf{3.53\%}$ decrease in ROUGE-L compared to $\textit{RECAP}$, the previous state-of-the-art. We will make our codes and resources publicly available.
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: Multimodality, Medical NLP, Report Generation
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 5896
Loading