SNOOPPI: Sequence-Normalized Database of On- and Off-Target Protein-Protein Interactions
Keywords: protein–protein interactions, sequence-resolved datasets, negative evidence, interaction benchmarking, biological networks
TL;DR: SNOOPPI is a sequence-resolved PPI dataset that makes cellular state explicit by encoding which specific protein variants do and do not physically interact.
Abstract: The set of physical protein-protein interactions (PPIs) realized in a cell defines a functional proteome whose interaction patterns constrain and characterize cellular state. PPIs are therefore central means by which biological processes are executed and therapeutic interventions act. Here, we introduce **SNOOPPI**, a **S**equence-**N**ormalized database of **O**n- and **O**ff-target **P**rotein-**P**rotein **I**nteractions, which represents the first unified dataset of binary PPIs that is isoform, post-translational modification, mutation, and binding site aware. By defining a PPI as a direct, physical interaction between two amino acid sequences, SNOOPPI overcomes several persistent limitations of existing PPI databases. SNOOPPI was curated from the IntAct database, taking full advantage of its experimental metadata and feature annotations to reclassify and uncover new PPIs. The final dataset comprises over 35.2K positive interactions and 5.3K negative interactions. SNOOPPI also retains 834.3K unresolved interactions, explicitly capturing gaps in the experimental literature. Beyond its usefulness as a reference dataset for the scientific community, SNOOPPI has the potential to serve as a high-confidence foundation for sequence-based modeling, benchmarking, and generative design of novel protein perturbations.
Presenter: ~Sophia_Vincoff1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: No, the presenting author of this submission does not fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 62
Loading