Guide for accompanying supplementary material. 

data - csv and text files containing TCR and pMHC sequences
	pmhc_stringent_split: training and validation split used for Table 1
	GLCTLVAML_holdout: training and validation split with all sequences in training minus GLCTLVAML
	KLGGALQAK_holdout: training and validation split with all sequences in training minus KLGGALQAK
	YVLDHLIVV_holdout: training and validation split with all sequences in training minus YVLDHLIVV

notebooks - visual examination of results and generations
	evaluation.ipynb: running and visualizing evaluation metrics

refs - repository of biological referernce information

src - framework code for facilitating dataset creation and model evaluation


Due to large model checkpoint size, was unable to include .bin files. Including the training code for no-pretraining Seq2Seq since it does not involve the use of missing tokenized datasets too large to include with this submission.

Thanks for your understanding.