Track: Machine learning: computational method and/or computational results
Keywords: Autoregressive Language Models, Protein Design
TL;DR: We train Llama2 to generate protein binder sequences conditioned only on the target sequence.
Abstract: The targeting of disease-driving proteins is a critical goal of biomedicine. However, many of these proteins do not possess accessible small molecule binding pockets and are oftentimes conformationally disordered, precluding binder design via structure-dependent methods. Here, we present PPI-Llama2, which tasks Meta's Llama2 autoregressive language model architecture to de novo generate protein binders conditioned directly on target sequences. Without relying on structural data and training only on protein-protein interaction (PPI) sequences, PPI-Llama2 effectively learns the evolutionary semantics of PPIs, enabling the generation of both novel and biologically-plausible binders. Comparative evaluations highlight PPI-Llama2's performance in generating binders for evolutionarily distant targets, performing strongly against structure-dependent methods like RFDiffusion. In total, our findings showcase PPI-Llama2's potential to aid therapeutic discovery for diseases driven by undruggable and disordered target proteins, and motivate further experimental screening efforts.
Submission Number: 73
Loading