Grounded autonomous research: a fault-tolerant LLM pipeline from corpus to manuscript in frontier computational physics
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: LLM agents, autonomous research, computational physics, grounded reasoning, fault tolerance, materials science
TL;DR: An LLM autonomously conceives, calibrates, computes, and writes a publication-grade frontier-physics manuscript by being grounded in literature throughout — across 47 fresh-context sessions and 2162 literature consultations.
Abstract: Autonomous-research agents have demonstrated end-to-end LLM automation in machine-learning sandboxes where execution provides calibration. Frontier physical science differs categorically: physical reasoning underlies parameter choices, and frontier toolchains and methodologies are often underdocumented. Researchers handle this by grounding in literature throughout — from idea conception, through methodology calibration, to manuscript writing. Unscaffolded LLM agents systematically fail to do this, hallucinating from internal priors and confidently producing plausible but unverifiable results.
Our pipeline bridges this gap, running end-to-end from a corpus of ~9,700 recent condensed-matter physics arXiv papers to a publication-grade manuscript with three substantive physics findings (here on altermagnetic piezomagnetism). The agent autonomously conceives a research direction by mapping the corpus, calibrates methodology by reproducing published references, conducts novel first-principles computations, and writes the manuscript — grounded in literature throughout. The pipeline runs as 47 fresh-context sessions across six phases sharing only on-disk state, with 2162 literature-consultation events. Fault tolerance emerges from redundancy: fresh-context isolation, distributed grounding, and adversarial review catch what any single session misses. Pre- and post-pilot stages are fully autonomous; pilot requires bounded human intervention only at reproduction failures — operational knowledge curation, not scientific direction.
Grounding in literature is universal across real research. The primitives, lessons, characterized failure modes, and bounded human intervention articulated here lay a foundation for autonomous research beyond computational physics, in high-stakes realistic scientific domains.
Submission Number: 280
Loading