Understanding Protein-DNA Interactions by Paying Attention to Protein and Genomics Foundation Models
Keywords: Protein-DNA interactions, Cross-Attention, Binding map prediction, Finetuning, Protein language models, Genomics language models
TL;DR: We use protein and DNA foundation models coupled with a cross-attention module to infer protein-DNA binding at a single amino acid and single nucleotide resolution.
Abstract: Protein-nucleic acid (NA) interactions are key in controlling gene regulation. There lies a strong motivation in understanding these interactions, with a goal of engineering these interactions to solve biological problems. Current methods to quantify protein-nucleic acids are mainly experimental and require much time and money. To mitigate this, Deep learning methods have recently been applied to predict Protein-DNA contacts. Although promising, these methods are computationally expensive and face challenges in accuracy. To address these challenges, we propose Seq2Contact, a novel method to predict the protein-NA binding at a single nucleotide (DNA) and single amino acid (Protein) level. Seq2Contact is built on protein and DNA foundation models to obtain nucleotide and amino acid-specific embeddings and then introduces a cross-attention module to obtain the binding contact maps. We employ a sequence-similarity-based clustering method to split the train-test data and empirically illustrate that Seq2Contact can achieve state-of-the-art performance, beating existing baselines by almost 20% (F1-Score) for Protein-NA binding prediction. Our method is computationally more efficient, with up to 80% less memory cost and more than 90% less inference time. Code is available at https://github.com/DhruvaRajwade/Seq2Contact
Submission Number: 44
Loading