PRISM: Physician Rules Integrated with Small large language Models for probable diagnoses associated with Abdominal Pain

Gautam Ahuja; Ayush Agarwal; Hara Prasad Mishra; Samagra Agrawal; Rik Ganguly; Zonunmawia; Akshay Sharma; Vatsal Batra; Bableen Kaur; Siddhant Poudyal; Himani Balutia; Sagarika; Sanjana Ahuja; Kedar Natarajan; Partha Pratim Das; Ramesh Jain; Partha Pratim Chakrabarti; Anurag Agrawal; Govind Makharia; Rintu Kutum

PRISM: Physician Rules Integrated with Small large language Models for probable diagnoses associated with Abdominal Pain

Published: 12 Oct 2025, Last Modified: 12 Nov 2025GenAI4Health 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: GenAI, Conversational mHealth app, Probable Diagnosis, Benchmarking, Small LLMs, Narrow GenAI Clinical Application

TL;DR: PRISM (Physician Rules Integrated with Small Models), a hybrid conversational system designed to augment within clinical workflows for diagnosing abdominal pain.

Abstract: Abdominal pain is a significant diagnostic challenge in high-volume, resource-constrained Outpatient Departments (OPDs) where it contributes to increased physician burden and potential diagnostic delays. The end-to-end workflow of Large Language Models (LLMs) offers conversational flexibility but lacks the reliability and transparency required for high-stakes clinical diagnoses. On the other hand, traditional rule-based systems are transparent but rigid. To address this, we introduce PRISM (Physician Rules Integrated with Small Models), a hybrid conversational system designed to be augmented within clinical workflows for diagnosing abdominal pain. PRISM utilizes a physician-guided rule-based engine for diagnostic reasoning, and an ensemble of small open-source large language models (SLMs) towards patient-facing conversational (empathic) and targeted non-diagnostic tasks (relevant keyword extractions with/without UMLS). PRISM design is focused and minimal autonomous; microservices-based architecture ensures both clinical robustness and a user-centric design. PRISM achieves top-5 accuracy of 80% − 100% and Mean Reciprocal Rank (MRR) of 0.596 − 0.603 on physician-curated 322 simulated patient question/answer pairs, outperforming the best end-to-end SLMs (top-5 accuracy of 65% and an MRR of 0.422). Comprehensive benchmarking of empathic, clarity, and helpfulness capabilities, we select the best SLMs for integration with PRISM’s empathy service. Likewise, benchmarking of SLMs was performed for the clinical keywords/terms extraction service of PRISM, using a synthetic patient Q/A dataset (SynD1) and a physician-curated simulated Q/A dataset (SimD2). Additionally, clinical keywords/terms were extended using curated, standardized, and harmonized Unified Medical Language System (UMLS) terms to evaluate the performance gain in PRISM compared to end-to-end SLMs. We introduce a multistep keyword-extraction approach to enhance clinical term extraction that leverages curated UMLS to provide focused, contextually relevant knowledge to SLMs, improving the clinical keyword extraction service. In summary, PRISM offers a structured, physician-in-the-loop, and resource-efficient blueprint for deploying practical generative AI applications in real-world clinical workflows.

Submission Number: 175

Loading