Conditional Generation of Antigen Specific T-cell Receptor Sequences

Published: 27 Oct 2023, Last Modified: 30 Nov 2023GenBio@NeurIPS2023 PosterEveryoneRevisionsBibTeX
Keywords: Large Language Models, Seq2Seq, Immunology, Many to Many
TL;DR: This work explores the use of sequence-to-sequence transformer models to generate antigen-specific T-cell receptor sequences conditional on peptide-MHC complexes, introducing tailored evaluation metrics to account for data sparsity.
Abstract: Training and evaluating large language models (LLMs) for use in the design of antigen specific T-cell receptor (TCR) sequences is challenging due to the complex many-to-many mapping between TCRs and their targets, a struggle exacerbated by a severe lack of ground truth data. Traditional NLP metrics can be artificially poor indicators of model performance since labels are concentrated on a few examples, and functional in-vitro assessment of generated TCRs is time-consuming and costly. Here, we introduce TCR-BART and TCR-T5, adapted from the prominent BART and T5 models, to explore the use of these LLMs for conditional TCR sequence generation given a specific target epitope. To fairly evaluate such models with limited labeled examples, we propose novel evaluation metrics tailored to the sparsely sampled many-to-many nature of TCR-epitope data and investigate the interplay between accuracy and diversity of generated TCR sequences.
Supplementary Materials: zip
Submission Number: 88
Loading