MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

ACL ARR 2024 June Submission5400 Authors

16 Jun 2024 (modified: 22 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Medical Visual Question Answering (MedVQA) provides language responses to image-based medical inquiries, facilitating more accurate diagnoses. However, existing MedVQA methods lack interpretability and transparency. To address this, we introduce a semi-automated annotation process and create new benchmark datasets, R-RAD and R-SLAKE, incorporating multimodal language models and human annotations. Additionally, we develop a framework, MedThink, to fine-tune lightweight generative models with medical decision-making rationales. This framework employs three distinct strategies to generate decision outcomes and corresponding rationales, effectively showcasing the medical decision-making process during reasoning. MedThink achieves 83.5\% accuracy on R-RAD and 86.3\% on R-SLAKE, outperforming current baselines. Dataset and code will be released.
Paper Type: Short
Research Area: Question Answering
Research Area Keywords: biomedical QA, multimodal QA, interpretability, vision question answering
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English
Submission Number: 5400
Loading