Keywords: protein engineering, active learning, conformal prediction, biological agents, benchmark
TL;DR: We introduce a safety-filtered benchmark and constrained agentic policy that improves fixed-budget top-1% protein variant discovery over standard active-learning and LLM-only planning baselines.
Abstract: Low-throughput protein engineering is a fixed-budget sequential decision problem: with only a small number of assay slots, the goal is top-tail discovery rather than global regression accuracy. We present CIDER-Bench, a safety-filtered retrospective benchmark that converts ProteinGym deep-mutational-scanning assays into batched design-build-test-learn campaigns, and CIDER-AGENT, a constrained policy that combines conformal top-tail calibration, information-directed acquisition, and diversity-aware batch optimization. The language-model component is limited to bounded controller actions with auditable traces and cannot propose variants outside the candidate set. In 48-query campaigns over 20 benign landscapes, CIDER-AGENT improves rare-hit discovery over static PLM ranking, standard active-learning baselines, LLM-only planners, and a FolDE baseline while maintaining zero invalid actions. Code, benchmark artifacts, and run protocol are available at \url{https://anonymous.4open.science/r/genbio-cider-65A3/}.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 233
Loading