Track: Biology: datasets and/or experimental results
Nature Biotechnology: Yes
Keywords: Peptide Discovery, Data-Efficient Algorithms, Computational Biology, Machine Learning, GFP
TL;DR: MDMI uses limited data to design novel, highly divergent peptides, surpassing traditional dataset requirements for peptide discovery.
Abstract: Peptide biologics represent a promising therapeutic frontier, but their discovery and optimization are often hindered by the requirement for extensive training datasets in machine learning approaches. Here we present Minimal Data Maximal Insight (MDMI), a novel computational method that enables peptide discovery using limited data (~100 sequences). Using a split Green Fluorescent Protein (GFP) system as our model, we develop a sequence-agnostic model with statistical potential scoring and physics-based evaluation to create an ensemble predictive model. This is coupled with a genetic algorithm for sequence optimization. With only one round of screening, we developed a model that yielded novel functional sequences 63% of which exhibited fluorescence. Notably, by analyzing high-activity sequences to identify favorable amino acids at each position, we were able to design peptide variants with more than 50% sequence difference from the wild type -far exceeding the mutation rates present in our training data- while maintaining functionality. By reducing dependency on large datasets, MDMI democratizes access to advanced computational tools for peptide engineering and offers a blueprint for accelerating therapeutic peptide discovery across various applications, from antimicrobials to targeted drug delivery.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Pouriya_Bayat1
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 97
Loading