Rad-Phi4-Vision-CXR: A Compact Multimodal Assistant for Versatile Radiology Workflows
Keywords: multimodal small language models, chest x-rays, radiology report generation, causal exploration
Track: Proceedings
Abstract: The integration of artificial intelligence into radiology underscores the need for efficient models capable of supporting a wide range of clinical tasks. We introduce Rad-Phi4-Vision-CXR, a compact multimodal vision-language model designed to seamlessly integrate into radiology workflows for chest X-rays. It supports radiology report generation, fine-grained visual question answering (VQA) for abnormalities and tubes/lines (including presence and placement), and grounding capabilities for anatomies, pathologies, and medical devices. Beyond these tasks, we propose a capability for findings generation with causal exploration of radiology findings and differential diagnosis, enabling the model to affirm findings or rule out conditions, thereby enhancing its utility in clinical decision-making. Rad-Phi4-Vision-CXR achieves state-of-the-art performance on the ReXrank benchmark for report generation, VQA, and grounding. Its compact architecture provides a scalable, high-performance solution for AI-driven radiology.
General Area: Models and Methods
Specific Subject Areas: Foundation Models, Explainability & Interpretability, Causal Inference & Discovery
PDF: pdf
Data And Code Availability: No
Ethics Board Approval: No
Entered Conflicts: I confirm the above
Anonymity: I confirm the above
Submission Number: 68
Loading