ProteinHypothesis: A Physics-Aware Chain of Multi-Agent RAG LLM for Hypothesis Generation in Protein Science

Published: 05 Mar 2025, Last Modified: 28 Mar 2025ICLR 2025 Workshop AgenticAI PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hypothesis Generation, Multi-Agent LLM, Retrieval-augmented Generation (RAG), Protein Science
Abstract: Scientific hypothesis generation is fundamental to advancing molecular biology and protein science. This study presents a novel AI-driven multi-agent framework that integrates Retrieval-Augmented Generation (RAG) with structured experimental data for automated hypothesis generation and validation. The methodology employs scientific literature retrieval, structured dataset analysis, and multi-agent evaluation, ensuring that generated hypotheses are scientifically rigorous and experimentally testable. The framework consists of three key phases: (1) Hypothesis Generation, where insights from literature and structured data are synthesized using large language models; (2) Multi-Agent Evaluation through Chain of Thoughts (CoT) mechanism, where hypotheses are assessed for internal consistency, feasibility analysis, novelty assessment, scientific impact, and scalability/generalizability; and (3) Final Selection and Validation, where high-scoring hypotheses undergo refinement using protein-specialized agents and are linked to experimental validation strategies such as molecular dynamics simulations, site-directed mutagenesis, and structural characterization. Results demonstrate the system’s ability to generate novel, high-impact hypotheses in protein stability, enzyme catalysis, ligand interactions, and biomolecular interactions, with broad applications in drug discovery, synthetic biology, and protein engineering. The study highlights the potential of AI-driven hypothesis generation in accelerating scientific discovery by integrating machine learning, structured data analysis, and multi-agent validation into research workflows. Our code is available at https://github.com/adibgpt/ProteinHypothesis.
Submission Number: 19
Loading