Agentic Knowledge Computing for Automated Biomarker Validation: Triangulated Causal Graph Construction in ALS Research

Published: 08 Nov 2025, Last Modified: 08 Nov 2025NeurIPS 2025 Workshop NORA PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Causal Knowledge Graphs, Triangulated Causal Validation Score (TCVS), Multi-Model Node Matching, Louvain Community Clusters, Knowledge Curator Agents, Counterfactual Analysis, Amyotrophic Lateral Sclerosis (ALS), Computational Biology, Agentic AI Sytems, Automatic Knowledge Curation
TL;DR: This paper presents a multi-model NLP framework achieving 94.62% precision in extracting ALS causal relationships via TCVS scoring, constructing a validated knowledge graph, and proposing agentic extensions for collaborative curation and Graph RAG.
Abstract: Amyotrophic Lateral Sclerosis (ALS) generates vast literature containing critical relationships between biomarkers, pathogenic mechanisms, and therapeutic targets. Extracting and validating these relationships at scale remains challenging due to biomedical language complexity and domain expertise requirements. We present a novel NLP framework combining foundation models with domain-specific embeddings to automatically extract, validate, and organize ALS knowledge from scientific literature. Our approach introduces the Triangulated Causal Validation Score (TCVS), a three-tier scoring mechanism fusing outputs from Mistral-7B, BioLinkBERT-large, and PubMedBERT-MNLI models against four curated gold standard ALS term lists. The framework processes documents through GROBID-based extraction, validates 4,689 unique terms and 3,840 causal relationships, achieving 94.62\% precision and 95.65\% recall against expert-labeled datasets. We construct a Causal Knowledge Graph (CKG) with weighted edges and apply Louvain community clustering to identify 150 major functional groups, revealing novel connections between biomarkers and ALS disease progression pathways. Counterfactual analysis demonstrates the framework's ability to predict downstream effects of biomarker or genetic perturbations. We further propose agentic extensions enabling collaborative multi-agent systems for specialized knowledge curation and graph-based retrieval augmented generation. This work contributes: (1) TCVS - a generalizable validation methodology; (2) hybrid node-matching and similarity computation; (3) demonstration of multi-model fusion advantages; and (4) a reproducible pipeline with agentic extensibility for domain-specific knowledge graph construction, reducing manual curation effort by 40\% while maintaining expert-level accuracy.
Submission Number: 6
Loading