Keywords: Automatic report generation, canine cardiomegaly diagnosis, multimodal processing
TL;DR: This paper introduces a Multimodal Clinical Integration Transformer (MCIT), a novel deep learning architecture for the automated generation of veterinary radiology reports.
Abstract: This paper introduces a Multimodal Clinical Integration Transformer (MCIT), a novel deep learning architecture for the automated generation of veterinary radiology reports. The primary challenge addressed is the subjective and time-consuming nature of manual report generation for conditions like canine cardiomegaly. The MCIT model introduces two key innovations: 1) It is multimodal, processing both radiographic images and structured clinical history to provide more context-aware diagnostics. 2) It integrates predicted clinical findings directly into the multimodal context, allowing the model to ground report generation in specific abnormalities. The MCIT model is trained and evaluated on a local dataset of 5,000 canine chest X-rays and corresponding reports. Our MCIT model demonstrates strong performance, with a BLEU-4 score of 0.510 and a Clinical F1 score of 0.920, demonstrating its potential to significantly improve the efficiency and accuracy of veterinary diagnostics.
Submission Number: 127
Loading