Abstract: Protein structure prediction has emerged as a powerful tool for biologists and drug makers. However, the computational cost of state-of-the-art models such as AlphaFold limits their scalability and makes training and fine-tuning prohibitively expensive. Although previous work has achieved considerable inference speedups by replacing the multiple sequence alignment step with protein language models, the overall architecture of structure prediction models, inherited from AlphaFold2, has remained largely unchanged. In this work, we show that protein language model-based structure predictors can be dramatically simplified at little to no loss in accuracy. Our model, MiniFold, consists of a redesigned Evoformer and a lightweight structure module. We also propose two novel GPU kernels, tailored to the proposed architecture. Equipped with the same ESM2 protein language model, MiniFold is competitive with ESMFold on the standard CAMEO and CASP datasets while achieving training and inference speedups of up to 20x, and significant reductions in peak memory. Our results show that MiniFold is an effective solution for large-scale applications and resource-constrained environments.
Submission Length: Regular submission (no more than 12 pages of main content)
Code: https://github.com/jwohlwend/minifold
Assigned Action Editor: ~Ole_Winther1
Submission Number: 4068
Loading