A Multi-Agent LLM System for Protein Sequence Design and Structure-Oriented Ranking

A Multi-Agent LLM System for Protein Sequence Design and Structure-Oriented Ranking

Agents4Science 2025 Conference Submission326 Authors

17 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Protein Design, Large Language Models (LLMs), Multi-Agent Systems, AlphaFold2, Generative Biology, Autonomous AI, Synthetic Biology, De Novo Sequence Generation, AI-Driven Scientific Discovery, Agent Mode

TL;DR: Autonomous LLM agents cooperatively generate and prioritize synthetic protein sequences, producing structure-predictive candidates with minimal human input.

Abstract: We present a modular, multi-agent generative framework for de novo protein sequence design and prioritization, developed and executed primarily by autonomous AI agents. The system uses cooperative large language models (LLMs) to synthesize amino acid segments in parallel, with each agent responsible for a subsequence. A downstream aggregation and refinement stage produces complete sequences, which are then filtered and ranked using interpretable biophysical heuristics. We generate 100 proteins using this workflow and evaluate their plausibility through property distributions, unsupervised clustering, and AlphaFold2-based structural prediction. Despite operating without evolutionary templates or functional labels, several top-ranked candidates display moderate structural confidence (mean pLDDT > 60, pDockQ > 0.5), suggesting that LLMs encode useful compositional priors. Our results support the use of agentic LLM architectures, paired with lightweight scoring and minimal human intervention, as a scalable strategy for upstream protein design pipelines.

Supplementary Material: zip

Submission Number: 326

Loading