Keywords: Visual Question Answering, Medical Imaging, Datasets
TL;DR: We introduce a large-scale medical VQA dataset with 3.4M LLM-generated, clinically grounded QA pairs from MIMIC-CXR to enable realistic, radiologist-style question answering from chest X-rays.
Abstract: The interpretation of chest X-rays (CXRs) is a critical yet time-consuming task in clinical radiology, often limited by the availability of expert radiologists. To address this challenge, we introduce a new large-scale medical Visual Question Answering (VQA) dataset derived from the MIMIC-CXR database, containing over 3.2 million question-answer (QA) pairs across 15 clinically relevant categories, enabling the training of multimodal models capable of answering a broad range of diagnostic questions directly from chest X-ray images. Unlike prior datasets, our QA pairs are generated using a large language model, LLaMA 3.1, guided by a carefully crafted prompt structure to produce rich, nuanced, and evidence-based textual answers grounded in radiology reports. We address limitations of existing datasets such as templated responses and linguistic monotony by ensuring diversity, completeness, and clinical fidelity in our QA pairs. To support benchmarking on this new dataset, we provide initial baseline models and training strategies designed to evaluate visual and textual reasoning performance in the medical VQA setting. Extensive experiments demonstrate the effectiveness of our approach across multiple evaluation metrics, establishing a strong benchmark for future research in medical VQA. Our dataset and baseline models pave the way for building clinically meaningful AI tools that can assist radiologists by answering complex diagnostic questions with accuracy and interpretability.
Primary Subject Area: Generative Models
Secondary Subject Area: Application: Radiology
Registration Requirement: Yes
Reproducibility: https://github.com/LightVED-prhlt/MIMIC-CXR-VQA-Dataset_Creation
Visa & Travel: No
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 95
Loading