An Analysis Graph for Statistical Genetics Agents

26 May 2026 (modified: 26 May 2026)VLDB 2026 Workshop BioDMS SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Submission Type: Project Talk Proposal
Keywords: Genetics, Knowledge Graph, Agents
TL;DR: We created a knowledge graph used by coding agents for statistical genetics analysis tasks and provenance tracking.
Abstract: Statistical genetics workflows have grown more complex in recent years, with new methods, software, genome annotations, and publicly available GWAS summary statistics outpacing what any one researcher can keep track of. The problem is sharpest for clinical investigators, who often have deep disease expertise but limited bandwidth to track an expanding post-GWAS toolkit. Coding agents can run the software, but command execution alone does not produce a reviewable, reusable analysis. We propose a graph-backed analysis agent for statistical genetics. Skills describe each analysis task; the agent executes them and uses the graph to pick methods, reconcile mismatched inputs, and route results between stages. The graph also records typed claims about every artifact, command, reference, and result, including dataset identity, genome build, linkage-disequilibrium (LD) reference panel, software version, quality-control summaries, and unresolved expert-review decisions. In a preliminary case study, the agent ran a nine-stage post-GWAS pipeline on a public inflammatory bowel disease (IBD) GWAS, finishing in 94.7 minutes on a single laptop CPU. The agent recovered the canonical IBD genetic architecture and left a queryable, reviewable record of every method, reference, and decision.
Submission Number: 12
Loading