GIV-CXR: Densely Grounded, Visually Interpretable, Chest X-Ray Question Answering Dataset

GIV-CXR: Densely Grounded, Visually Interpretable, Chest X-Ray Question Answering Dataset

ACL ARR 2025 July Submission1344 Authors

29 Jul 2025 (modified: 20 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Modern medical community seeks precise, multimodal interpretability. People want to explicitly connect image regions to diagnostic outcomes and reason using natural language. Large Multimodal Models (LMMs ) are rapidly advancing open domain vision-language reasoning, yet progress in medical visual question answering (Med-VQA ) remains limited by two persistent bottlenecks: the scarcity of large -scale region -grounded supervision and the high cost of continuous radiologist over-sight. We present an automated Chest X-ray Med-VQA generation-validation pipeline and a grounded Chest X-ray (CXR) dataset GIV-CXR built on top of the Chest ImaGenome dataset. The automated pipeline incorporates LMMs based question-answers generation and validation scaling grounded data generation while preserving clinical reliability. Prompts incorporating domain experts insights regulate question-answer generation ensuring clinical regulation and Large Language Models (LLMs) evaluators bring in the reliability from model generated Question answers. GIV-CXR is a large scale dataset embibing 20,534 images from Chest ImaGenome, annotated over 81,257 bounding boxes, resulting in 354,293 question-answer pairs. The prompts used to generate the QA pairs are designed strategically to imbibe in-depth reasoning for efficient grounding. Standard MLMs underperformed on a sampled test set highlighting the lack of grounding capabilities of the models. On fine-tuning the LMMs on our dataset, the models demonstrate significantly better reasoning and grounding enhancing their interpretability. We will release the resources along with a detailed instructions and ethical use guidelines upon acceptance.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Medical Visual Question Answering, Large Multimodal Models, Interpretability, Visual Grounding, LLM-as-a-judge

Contribution Types: Approaches to low-resource settings, Data resources

Languages Studied: English

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: Yes

A2 Elaboration: 6

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: 2

B2 Discuss The License For Artifacts: No

B2 Elaboration: We will do that with dataset and resource release upon acceptance

B3 Artifact Use Consistent With Intended Use: No

B3 Elaboration: We will do that with dataset and resource release upon acceptance

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: We have used a public opensource dataset

B5 Documentation Of Artifacts: No

B5 Elaboration: We will do that with dataset and resource release upon acceptance

B6 Statistics For Data: Yes

B6 Elaboration: Apendix of the paper

C Computational Experiments: Yes

C1 Model Size And Budget: N/A

C1 Elaboration: Some proprietary servers were used about which details can't be provided. We can give some details in the apendix on acceptance

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: We discuss the models used and all hyperparameters

C3 Descriptive Statistics: Yes

C3 Elaboration: Apendix of the paper

C4 Parameters For Packages: Yes

C4 Elaboration: 3

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: N/A

D1 Elaboration: We can give some details in the apendix on acceptance

D2 Recruitment And Payment: N/A

D2 Elaboration: That was a internal decision which cannot be shared. The annotators were domain experts so the pay was adequate.

D3 Data Consent: N/A

D3 Elaboration: That was a internal decision which cannot be shared. The annotators were domain experts so there was a formal agreement.

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: 6

Author Submission Checklist: yes

Submission Number: 1344

Loading