Closed-Form Test Functions for Biophysical Sequence Optimization Algorithms

Published: 17 Jun 2024, Last Modified: 16 Jul 2024ML4LMS PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: black-box-optimization, discrete-optimization, benchmarking, drug-discovery, protein-design
TL;DR: We propose a new closed-form test function for black-box sequence optimization with geometric similarities to real protein design problems
Abstract: There is a growing body of work seeking to replicate the success of machine learning (ML) on domains like computer vision (CV) and natural language processing (NLP) to applications involving biophysical data. One of the key ingredients of prior successes in CV and NLP was the broad acceptance of difficult benchmarks that distilled key subproblems into approachable tasks that any junior researcher could investigate, but good benchmarks for biophysical domains are rare. This scarcity is partially due to a narrow focus on benchmarks which simulate biophysical data; we propose instead to carefully abstract biophysical problems into simpler ones with key geometric similarities. In particular we propose a new class of closed-form test functions for biophysical sequence optimization, which we call Ehrlich functions. We provide empirical results demonstrating these functions are interesting objects of study and can be non-trivial to solve with a standard genetic optimization baseline.
Poster: pdf
Submission Number: 14
Loading