Keywords: Clinical Natural Language Inference, Clinical Reasoning, Large Language Models, Agentic Frameworks, Modular Reasoning, Verification, Safety-Critical NLP
Abstract: Large language models can produce fluent judgments for clinical natural language inference, yet they frequently fail when the decision requires the correct inferential schema rather than surface matching. We introduce \textit{CARENLI}, a compartmentalised agentic framework that routes each premise–statement pair to a reasoning family and then applies a specialised solver with explicit verification and targeted refinement. We evaluate on an expanded CTNLI benchmark of 200 instances spanning four reasoning families: \textsc{Causal Attribution}, \textsc{Compositional Grounding}, \textsc{Epistemic Verification}, and \textsc{Risk State Abstraction}. Across four contemporary backbones models, \textit{CARENLI} improves mean accuracy from about 23\% with direct prompting to about 57\%, a gain of roughly 34 points, with the largest benefits on structurally demanding reasoning types. These results support compartmentalisation plus verification as a practical route to more reliable and auditable clinical inference.
Paper Type: Long
Research Area: Semantics: Lexical, Sentence-level Semantics, Textual Inference and Other areas
Research Area Keywords: textual inference, natural language inference, semantic reasoning, logical reasoning, model evaluation, error analysis, clinical NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 6098
Loading