Track: Machine learning: computational method and/or computational results
Nature Biotechnology: Yes
Keywords: pocket finding, embedding learning, structural representation, sequence representation, cryptic pockets
TL;DR: We leverage sequence and structural embeddings for pocket prediction in relevant benchmarks, where we perform competitively.
Abstract: Accurately identifying protein binding sites is essential for drug discovery, yet existing computational methods often struggle to balance precision, recall, and scalability. We introduce PickPocket, a deep learning model that integrates sequence-derived evolutionary embeddings from ESM-2 with geometric structural representations from GearNet to predict ligand-binding sites at the proteome scale. PickPocket leverages both residue-level sequence context and graph-based spatial relationships, enabling it to generalize across diverse protein families while maintaining high precision. Evaluated on the LIGYSIS benchmark, PickPocket outperforms state-of-the-art methods, achieving the highest F1 score (0.42) and maintaining a competitive MCC (0.37). PickPocket effectively predicts cryptic pockets, surpassing specialized models like PocketMiner even without explicit training on ligand-induced conformational changes. Our large-scale analysis of 356,711 proteins further demonstrates PickPocket’s ability to identify novel binding sites across human drug targets. By combining evolutionary and geometric learning, PickPocket represents a scalable, data-driven approach for structure-based drug discovery.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Stelina_Tarasi1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 94
Loading