Multi-Task Training Increases Native Sequence Recovery of Antigen-Specific T-cell Receptor Sequences
Keywords: Machine Translation, Seq2Seq, Computational Immunology, Large Language Models
TL;DR: This paper looks at multi-task training, specifically bidirectional translation between immune receptor-antigen pairs, to learn a robust mapping between unseen peptide-MHC complexes and their cognate TCRs.
Abstract: T-cells are a critical component of the adaptive
immune system that use T-cell receptors (TCRs)
to bind highly specific non-self peptide fragments
presented by major histocompatibility complex
(MHC) molecules on the surface of other cells.
Given their importance, a foundation model of
TCR specificity that is capable of reliably mapping between TCR sequences and their cognate
peptide-MHC (pMHC) ligands remains an unmet need. This study presents a key step towards
developing a comprehensive foundation model
by exploring the bi-directional mapping of both
pMHCs to their corresponding TCRs, and vice
versa. While validation performance was significantly worse in the TCR to pMHC direction given
the highly asymmetric distribution of pMHC data,
we find that the bidirectionally trained model outperformed the model trained in a single pMHC
to TCR direction. We present our findings as a
potential direction towards a unified generative
foundation model of TCR:pMHC cross-reactivity.
Submission Number: 49
Loading