BioBlobs: Differentiable Graph Partitioning for Protein Representation Learning

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: protein representation learning, graph partitioning, graph neural networks
TL;DR: BioBlobs learns a shared codebook of flexible, motif-level partitions of proteins — improving encoder performance while providing interpretable, function-relevant substructures
Abstract: Protein function is driven by coherent substructures which vary in size and topology, yet current protein representation learning models (PRL) distort these signals by relying on rigid substructures such as $k$-hop and fixed radius neighborhoods. We introduce $\textrm{BioBlobs}$, a plug-and-play, fully differentiable module that represents proteins by dynamically partitioning structures into flexibly-sized, non-overlapping substructures (``blobs”). The resulting blobs are quantized into a shared and interpretable codebook, yielding a discrete vocabulary of function-relevant protein substructures used to compute protein embeddings. We show that $\textrm{BioBlobs}$ representations improve the performance of widely used protein encoders such as GVP-GNN across various PRL tasks. Our approach highlights the value of architectures that directly capture function-relevant protein substructures, enabling both improved predictive performance and mechanistic insight into protein function.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 22839
Loading