Keywords: Genomic foundation models, biosynthetic gene clusters, natural product discovery, metagenomics, genome mining, agentic biology
TL;DR: BGC-Master detects biosynthetic gene clusters directly from DNA using frozen Evo2 embeddings and prioritizes novel metagenomic candidates for wet-lab follow-up.
Abstract: Biosynthetic gene clusters (BGCs) encode the enzymatic machinery behind microbial natural products, yet current mining tools remain biased toward known biosynthetic motifs. This reliance limits the retrieval of clusters that diverge from canonical motifs in expanding metagenomic collections. We present \textbf{BGC-Master}, a method that uses frozen Evo2~7B embeddings and a compact one-dimensional U-Net decoder to localize biosynthetic gene clusters directly from genomic sequence. BGC-Master achieves an F1 of 0.521 on a held-out 9-genome benchmark, outperforming antiSMASH 7 (0.256), DeepBGC (0.117), and BGC-Prophet (0.393) under a shared overlap-based evaluator. Applied to the OER004256 marine metagenome, BGC-Master prioritizes 171 reference-unannotated candidates with biosynthetic evidence and low MIBiG similarity, including regions not recovered by antiSMASH expanded detection. By exposing detection and evidence-based prioritization as agent-ready tools, BGC-Master provides a modular substrate for future agentic natural-product discovery workflows.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 71
Loading