Evaluating predictive patterns of antigen specific B cells by single cell transcriptome and antibody repertoire sequencing

Published: 04 Mar 2024, Last Modified: 07 May 2024MLGenX 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: immunogenomics, antibody repertoire sequencing, single cell transcriptome sequencing, machine learning for antibody discover and engineering, single cell sequencing dataset
TL;DR: We introduce a new dataset of single cell transcriptome and antibody sequencing and evaluated various machine learning models for predicting antigen specificity.
Abstract: The field of antibody drug discovery relies substantially on extensive experimental screening of B cells from immunized animals. Machine learning (ML)-guided prediction of antigen-specific B cells offers the potential to accelerate antibody drug discovery, however this requires sufficient labeled training data. Addressing this challenge, our study focuses on antigen specificity prediction using a novel dataset of B cells with single-cell transcriptome and antibody repertoire sequencing. We identify key patterns in gene expression (GEX) indicative of antigen specificity and elucidate the sequence diversity distribution of antigen-specific antibody sequences in immune repertoire data. We evaluate linear (Logistic Regression), non-linear (Support Vector Classification) and ensemble-based (Random Forest, Gradient Boosting) models trained on different feature combinations of GEX and antibody sequence. Additionally, transfer learning approaches using features generated from ESM-2, a general protein language model (PLM), as well as from AntiBERTy, an antibody specific PLM, were evaluated as inputs to these models. Our findings reveal that GEX-based models demonstrate superior performance in specificity predictions with F1 scores up to 0.939 compared to antibody sequence-based models, highlighting the intricate nature of immune repertoire modeling. Contrary to our expectations, using PLM features did not enhance predictive accuracy. Our research contributes to the computational discovery of antibody therapeutics, offering insights into B cell biology and serving as dataset contribution to the development of ML approaches in this field.
Submission Number: 20
Loading