Leveraging Evidence-Guided LLMs to Enhance Trustworthy Depression Diagnosis

Published: 19 Aug 2025, Last Modified: 12 Oct 2025BHI 2025EveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the IEEE BHI 2025 conference submission's policy on behalf of myself and my co-authors.
Keywords: Large Language Models, Diagnostic Reasoning, Mental Health AI, DSM-5, Knowledge Graphs, Explainable AI, Evidence-Guided Reasoning
Abstract: Large language models (LLMs) show promise in automating clinical diagnosis, yet their non-transparent decision-making and limited alignment with diagnostic standards hinder trust and clinical adoption. We address this challenge by proposing a two-stage diagnostic framework that enhances transparency, trustworthiness, and reliability. First, we introduce evidence-guided diagnostic reasoning (EGDR), which guides LLMs in generating structured diagnostic hypotheses by interleaving evidence extraction and logical reasoning, grounded in DSM-5 criteria. Second, we propose a Diagnosis Confidence Scoring (DCS) module that evaluates the factual accuracy and logical consistency of generated diagnoses through two interpretable metrics: Knowledge Attribution Score (KAS) and Logic Consistency Score (LCS). Evaluated on the D4 dataset with pseudo-labels, EGDR outperforms Direct in-context prompting and Chain-of-Thought (CoT) across five LLMs. For instance, on OpenBioLLM, EGDR improves accuracy from 0.31 (Direct) to 0.76 and DCS from 0.50 to 0.67. On MedLlama, DCS rises from 0.58 (CoT) to 0.77. EGDR yields up to +45% accuracy and +36% DCS gains over baselines, offering a clinically grounded, interpretable foundation for trustworthy AI-assisted diagnosis.
Track: 4. Clinical Informatics
Registration Id: 6QNYQKQMFZ3
Submission Number: 370
Loading