Autoformalizing Biomedical Text into Verified Knowledge Graph Reasoning: A Neuro-Symbolic Architecture for Alzheimer's Disease

David Scott Lewis; Enrique Zueco

Autoformalizing Biomedical Text into Verified Knowledge Graph Reasoning: A Neuro-Symbolic Architecture for Alzheimer's Disease

David Scott Lewis, Enrique Zueco

Published: 05 Mar 2026, Last Modified: 30 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 10 pages)

Keywords: autoformalization, biomedical text, Alzheimer’s disease, verified knowledge graph, neuro-symbolic reasoning, typed logical predicates, Answer Set Programming, temporal logic checking, LLM proposer–verifier, symbolic solvers, clinical reasoning, biomarker sentence parsing, entity linking, retrieval-augmented formalization, multi-hop biomedical QA, chain-of-thought reasoning, neuro-symbolic CoT, inconsistency reduction, trial protocol verification, auditable decision support, constraint satisfaction, formal semantics

TL;DR: Autoformalizes Alzheimer’s biomedical text into a typed, verifiable AD knowledge graph: LLMs propose predicates; ASP + temporal logic verifiers enforce consistency. Benchmarks show less reasoning inconsistency and modest entity-F1 gains.

Abstract: Alzheimer’s disease (AD) research generates vast amounts of unstructured biomedical text—clinical protocols, biomarker studies, and mechanistic hypotheses—that remain disconnected from formal computational reasoning. We introduce a neuro-symbolic architecture that autoformalizes biomedical text into a typed, verifiable knowledge graph (AD-KG), enabling auditable reasoning over AD biomarkers, patient stratification, and trial-protocol verification. Large language models serve as proposers that translate natural-language descriptions into typed logical predicates, while Answer Set Programming (ASP) solvers and temporal-logic checkers serve as verifiers that enforce machine-checkable consistency. We evaluate three core capabilities on newly constructed benchmarks: (1) Autoformalization of biomarker sentences, where retrieval-augmented formalization raises Entity F1 from 0.362 to 0.414 over an LLM-only baseline; (2) Clinical reasoning on multi-hop questions, where Chain-of-Thought achieves the highest accuracy (86.7%) while Neuro-Symbolic CoT (NS-CoT) reduces inconsistency rate from 81.1% to 45.6% at the cost of lower verification rate; and (3) Protocol verification on trial protocols, where all methods achieve 0.605 F1 with perfect recall, indicating that current symbolic verification does not yet differentiate from neural-only approaches at small scale. These results demonstrate that neuro-symbolic integration provides measurable benefits for inconsistency reduction in clinical reasoning, while highlighting areas where further development is needed for autoformalization and protocol verification.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.

Submission Number: 142

Loading