ASPred: Identification of Antigen Specific B-cell receptors from single V(D)J sequences using Large Language Models

Published: 18 Oct 2024, Last Modified: 18 Nov 2024lxai-neurips-24EveryoneRevisionsBibTeXCC BY 4.0
Track: Full Paper
Abstract: The rapid sequencing of antibody genes has accelerated vaccine development. However, predicting synthetic antibodies capable of binding and neutralizing novel antigens remains challenging due to a limited understanding of the rules of protein-protein interaction at the surface of an antigen to which its cognate antibody protein binds. While recent advances in single-cell sequencing of antibody-producing B-cells sequences have improved precision in mapping B-cell receptors (or BCRs, which are the membrane-bound forms of the antibodies) to their cognate antigens, there remain additional challenges. We have developed a computational strategy, the Antibody Specificity Predictor (ASPred), with which we have trained two Large Language Models (LLMs) with known sequences of antigen-BCR pairs to predict antigen-specific BCRs from the total BCR repertoire of immunized mice. By leveraging pattern recognition capabilities of LLMs we successfully classify novel B-cell receptors with a challenge antigen not represented in the training set, without the need for preselecting the B cells by antigen binding. The properties of the top 10 predicted candidates were validated by coarse-grained molecular dynamics simulations. These results suggest that sufficient information exists in BCR-antigen sequence pairs for LLMs to reliably predict antigen-antibody interaction specificity, potentially opening new avenues for the computational design of synthetic antibodies for vaccine and therapeutic development.
Submission Number: 55
Loading