Multi-Task Training Increases Native Sequence Recovery of Antigen-Specific T-cell Receptor Sequences

Published: 17 Jun 2024, Last Modified: 27 Jul 2024AccMLBio PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Translation, Seq2Seq, Computational Immunology, Large Language Models
TL;DR: This paper looks at multi-task training, specifically bidirectional translation between immune receptor-antigen pairs, to learn a robust mapping between unseen peptide-MHC complexes and their cognate TCRs.
Abstract: T-cells are a critical component of the adaptive immune system that use specialized T-cell receptors (TCRs) to bind non-self peptide fragments presented by major histocompatibility complex (MHC) molecules on the surface of other cells. Given their importance, a foundation model of TCR specificity that is capable of reliably mapping between TCR sequences and their cognate peptide-MHC (pMHC) ligands remains an unmet need. This study presents a key step towards developing a comprehensive foundation model by exploring the bi-directional mapping of both pMHCs to their corresponding TCRs, and vice versa. While validation performance was significantly worse in the TCR to pMHC direction given the highly asymmetric distribution of pMHC data, we find that the bidirectionally trained model outperformed the model trained in a single pMHC to TCR direction, at the cost of diversity. We work through a rigorous evaluation using well characterized pMHCs and present our framework and findings as a potential direction towards a unified generative foundation model of TCR:pMHC cross-reactivity.
Submission Number: 49
Loading