AbLEF: Antibody Language Ensemble Fusion for thermodynamically empowered property predictions

Published: 25 Oct 2023, Last Modified: 10 Dec 2023AI4D3 2023 PosterEveryoneRevisionsBibTeX
Keywords: multimodal deep learning, protein language models, CNN, GNN, structural ensembles, molecular dynamics, protein property prediction
TL;DR: AbLEF fuses fine-tuned protein language models and structural ensemble representations to enhance antibody property prediction.
Abstract: Pre-trained protein language and/or structural models are often fine-tuned on drug development properties (i.e., developability properties) to accelerate drug discovery initiatives. However, these models generally rely on a single structural conformation and/or a single sequence as a molecular representation. We present a physics-based model whereby structural ensemble representations are fused by a transformer-based architecture and concatenated to a language representation to predict antibody protein properties. AbLEF enables the direct infusion of thermodynamic information into latent space and this enhances property prediction by explicitly infusing dynamic molecular behavior that occurs during experimental measurement. We find that $\textbf{(1)}$ ensembles of structures generated from molecular simulation can further improve antibody property prediction for small datasets, $\textbf{(2)}$ fine-tuned large protein language models can match smaller antibody-specific language models at predicting antibody properties, $\textbf{(3)}$ trained multimodal sequence and structural representations outperform sequence representations alone, $\textbf{(4)}$ pre-trained sequence with structure models are competitive with shallow machine learning (ML) methods in the small data regime, and $\textbf{(5)}$ predicting measured antibody properties remains difficult for limited high fidelity datasets. AbLEF has been made publicly available at https://github.com/merck/AbLEF.
Submission Number: 55
Loading