Multimodal Clinical Integration Transformer for Automated Veterinary Radiology Report Generation

Multimodal Clinical Integration Transformer for Automated Veterinary Radiology Report Generation

Agents4Science 2025 Conference Submission127 Authors

13 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Automatic report generation, canine cardiomegaly diagnosis, multimodal processing

TL;DR: This paper introduces a Multimodal Clinical Integration Transformer (MCIT), a novel deep learning architecture for the automated generation of veterinary radiology reports.

Abstract: This paper introduces a Multimodal Clinical Integration Transformer (MCIT), a novel deep learning architecture for the automated generation of veterinary radiology reports. The primary challenge addressed is the subjective and time-consuming nature of manual report generation for conditions like canine cardiomegaly. The MCIT model introduces two key innovations: 1) It is multimodal, processing both radiographic images and structured clinical history to provide more context-aware diagnostics. 2) It integrates predicted clinical findings directly into the multimodal context, allowing the model to ground report generation in specific abnormalities. The MCIT model is trained and evaluated on a local dataset of 5,000 canine chest X-rays and corresponding reports. Our MCIT model demonstrates strong performance, with a BLEU-4 score of 0.510 and a Clinical F1 score of 0.920, demonstrating its potential to significantly improve the efficiency and accuracy of veterinary diagnostics.

Submission Number: 127

Loading