MSA Pairing Transfomer: protein interaction partner prediction with few-shot contrastive learning

Published: 17 Jun 2024, Last Modified: 17 Jun 2024AccMLBio PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: protein language model, protein-protein interaction, MSA, contrastive learning
Abstract: We study the problem of pairing interacting pairs of protein sequences within protein families that are known to interact. We propose to fine-tune the MSA Transformer to predict interaction partners by applying contrastive learning to embeddings of pairs of interacting domains in scrambled single-chain multiple sequence alignments (MSAs). We demonstrate the effectiveness of our model across a set of bacterial interactions for which ground-truth pairings are known, finding that it is possible to achieve high pairing accuracy even within small sets of pairable sequences, unlike previous methods based on models of co-evolutionary statistics. Across a large dataset of prokaryotic interactions with experimentally determined complexes, paired cross-chain MSAs generated by our model contain co-evolutionary signal that more strongly encodes interface contacts than MSAs paired by widely-used heuristic methods. We believe that our approach offers a potential direction for further extending the successes of co-evolutionary analysis beyond individual proteins to protein-protein interactions.
Submission Number: 40
Loading