Falcon Medical Visual Question Answering

Published: 01 Jan 2025, Last Modified: 29 Jul 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Vision-Language Models (VLMs) bridge the gap between visual and textual data, enabling multimodal tasks like Visual Question Answering (VQA). Leveraging this capability, Medical VQA systems have the potential to transform clinical decision-making by allowing healthcare providers to query medical images—such as X-rays, MRIs, and CT scans—and receive rapid, informed responses, thereby speeding up diagnoses and treatment planning. In this work, we introduce Falcon Med-VQA, a generative VQA system meticulously designed to interpret visual and textual medical data and generate free-form answers to medical questions. By leveraging a vision language model and a dynamic model selection mechanism, Falcon Med-VQA ensures relevance and precision in its responses. The system is equipped with an intuitive user interface that displays top answers with Confidence Scores (CF), enhances explainability through medical terminology extraction, and offers attention map visualizations for improved interpretability. Our experiments demonstrate that Falcon Med-VQA achieves comparable performance against specialized models and outperforms recent generative approaches in a key benchmark.
Loading