Sibyl: A Multi-Agent Pipeline for Autonomous Hypothesis Generation

Published: 30 May 2026, Last Modified: 06 Jun 2026ICML2026-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 3: AI Scientist Proposal Competition
Keywords: AI scientist, multi-agent systems, literature-based discovery, hypothesis generation, scientific reasoning, provenance auditing
TL;DR: Multi-agent LLM pipeline generating falsifiable scientific predictions from literature, with temporal backtesting (18% confirmed), human checkpoints, and cross-prediction consistency auditing for hallucination detection.
Abstract: We present Sibyl, a multi-agent LLM pipeline that autonomously produces falsifiable predictions from published scientific literature, covering literature synthesis, knowledge representation, hypothesis generation, and hypothesis evaluation. The system uses a tiered agent architecture with two mandatory human-in-the-loop checkpoints and an automated provenance audit, evaluated through a temporal backtesting framework. In a proof-of-concept deployment on X-ray binary astrophysics, the system generated 60 predictions from pre-2015 literature, of which 11 (18%) were confirmed by independent post-2015 publications (12.5% under the most conservative provenance filters). These are preliminary results from an ongoing project.
Submission Number: 22
Loading