Affinage: Genome-Scale Mechanistic Gene Annotation from the Published Literature

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: gene annotation, LLM pipeline, mechanistic reasoning, genome-scale, protein function, literature mining, evaluation framework
TL;DR: Affinage annotates all human genes with PMID-anchored mechanistic narratives from the literature, filling gaps where UniProt is sparse, and serves as a reusable base layer for downstream LLM reasoning.
Abstract: Gene-level annotations are a bottleneck for both biologists reasoning about unfamiliar genes and computational pipelines that embed or reason over per-gene descriptions: literature-grounded LLM retrieval is expensive per gene, while curated databases lag the literature. Here, we present Affinage, an LLM pipeline that performs literature retrieval and mechanistic reasoning once per gene — with a biologist-designed reading pass that extracts only direct experimental evidence—and stores the result as a reusable, structured, PubMed ID (PMID)-anchored annotation. The synthesis pass produces a mechanistic narrative, a per-finding mechanistic history with open questions, and a structured mechanism profile; pre-existing database sources are not considered during synthesis. Applied genome-wide to all human protein-coding genes in a two-pass pipeline with deterministic structural-QC retry, Affinage produces a substantive mechanistic narrative for 92% of annotated genes, including 28% of the genome where UniProt’s curated function field is empty or under 200 characters. All 19,291 records are available through a public REST API and MCP server at https://affinage.wi.mit.edu, designed as a stackable base layer for downstream reasoning systems.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 134
Loading