A Symbolic Regression Screening Approach Within Peptide Optimisation

Aidan Murphy, Mark Kocherovsky, Nir Dayan, Iliya Miralavy, Assaf A. Gilad, Wolfgang Banzhaf

Published: 2025, Last Modified: 28 Feb 2026EvoApplications (2) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The Protein Optimization Evolving Tool is a genetic programming based peptide generation tool which has successfully created novel peptides with improved performance for MRI imaging. However, like all supervised machine learning techniques, it may overfit to its library of training peptides and create peptides which do not improve functionality. To overcome this problem we create symbolic regression models to act as another predictor of peptide function. We create a set of 76 features of physicochemical, theoretical and composite properties for each peptide and evolve the models using Grammatical Evolution on two datasets, one containing 74 peptides and the other 100 peptides. Models trained using these 76 features can successfully predict peptide functionality with a median MSE of 0.427 on the first dataset and 0.179 on the larger dataset, achieving state of the art results on both. We next investigate if a reduced set of 8 real-world features, which could result in more interpretable models, can accurately predict protein functionality. The models created on this reduced set were outperformed by model with used the full set on features on the first dataset but were statistically equivalent on the second dataset. Finally, we down sample the data at 10%, 33% and 50% to evaluate the robustness of this approach. Our results show that models trained on as little as 7 peptides can be used as an additional measure of functionality within the Protein Optimization Evolving Tool.
Loading