PRISM: Physician Rules Integrated with Small large language Models for probable diagnoses associated with Abdominal Pain
Keywords: GenAI, Conversational mHealth app, Probable Diagnosis, Benchmarking, Small LLMs, Narrow GenAI Clinical Application
TL;DR: PRISM (Physician Rules Integrated with Small Models), a hybrid conversational system designed to augment within clinical workflows for diagnosing abdominal pain.
Abstract: Abdominal pain is a significant diagnostic challenge in high-volume, resource-constrained Outpatient Departments (OPDs) where it contributes to increased physician burden and potential diagnostic delays. The end-to-end workflow of Large Language Models (LLMs) offers conversational flexibility but lacks the reliability and transparency required for high-stakes clinical diagnoses (source). At the same time, traditional rule-based systems are transparent but rigid. To address this, we introduce PRISM (Physician Rules Integrated with Small Models), a hybrid conversational system designed to augment within clinical workflows for diagnosing abdominal pain. PRISM utilizes a physician-guided rule-based engine for diagnostic reasoning, along with composite of small open-source large language models (small-LLM) for patient-facing conversational (empathic) and targeted non-diagnostic tasks (relevant keyword extractions with/without UMLS). PRISM design focused and less autonomous microservices-based architecture ensures both clinical robustness and a user-centric design. PRISM achieves top-5 accuracy of 85%-100% and Mean Reciprocal Rank (MRR) of 0.596-0.603 on physician-curated 322 simulated patient question/answer pairs, outperforming the best end-to-end small-LLM with top-5 accuracy of 65% and a MMR of 0.422. Comprehensive benchmarking of empathic, clarity and helpfulness capabilities of the small-LLMs to select the best small-LLM to be integrated with PRISM’s empathy service. Likewise, benchmarking of small-LLMs were performed for clinical keywords/terms extraction service of PRISM, using a synthetic patient Q/A dataset (SynD1) and a physician curated simulated Q/A dataset (SimD2). Additionally, clinical keywords/terms were extended using curated, standardized and harmonized Unified Medical Language System (UMLS) terms to evaluate the performance gain in PRISM in comparison to end-to-end small-LLMs. We introduced a multistep keyword-extraction approach to enhance clinical term extraction that leverages curated UMLS to provide focused, contextually relevant knowledge to small-LLMs, improving the clinical keywork extraction service. PRISM offers a structured, physician-in-the-loop, and resource-efficient blueprint for deploying practical generative AI applications in real-world clinical workflows.
Submission Number: 175
Loading