Ontology-Guided Prompting for Reasoning in Multimodal Vision-Language Models: An Application to Rare Dental Disease

Published: 14 Jun 2025, Last Modified: 16 Aug 2025MKLM 2025EveryoneRevisionsBibTeXCC BY 4.0
Submission Type: Archive
Keywords: Vision-Language Models (VLMs), Symbolic Prompting, Ontology-Guided Reasoning, Chain-of-Thought (CoT), Multimodal Learning, Rare Diseases, Clinical Decision Support
TL;DR: We guide vision-language models with ontology driven prompts to enable structured reasoning, demonstrated on rare dental disease images
Abstract: Vision-language models (VLMs) have demonstrated strong generalization across multimodal tasks, enabling applications in medical image interpretation, robotic perception, and education. However, their lack of grounding in domain specific knowledge often leads to hallucinations, especially when applied to out of distribution data where precision and explainability are critical. Prompt engineering provides a lightweight alternative to fine tuning for adapting VLMs to specialized tasks, but remains fragile and lacks guarantees for factual accuracy. Fine tuning, while more robust, is computationally expensive and often impractical in privacy sensitive environments. We focus on a high stakes application: symptom level reasoning in rare dental diseases, such as dental agenesis and enamel defects. These conditions present diagnostic challenges due to low prevalence, overlapping symptoms, and limited labeled data making them an ideal testbed for evaluating the adaptability of general purpose VLMs. We propose an ontology guided prompting framework that enables interpretable, step by step reasoning without model retraining. A domain specific ontology, created with clinical experts, models the rare disease domain of our dataset including disease symptom relationships and supports the generation of chain-of-thought (CoT) prompts. These prompts guide VLMs such as MiniGPT-4, LLaVA, and BLIP-2 to extract medically grounded reasoning from dental images. Our method leverages the models’ latent medical knowledge through symbolic constraints and semantic filtering based on ontology terms. We evaluate three prompting strategies: zero-shot, human feedback, and ontology-guidance and assess reasoning quality using F1 score, ontology coverage, and hallucination rate. Results show that ontology-guided prompting significantly improves factual alignment and reduces hallucinations, supporting safe and explainable VLM deployment in clinical domains.
Submission Number: 6
Loading