The Discovery of Binding Modes Requires Rethinking Docking Generalization

Published: 27 Oct 2023, Last Modified: 29 Nov 2023GenBio@NeurIPS2023 SpotlightEveryoneRevisionsBibTeX
Keywords: generalization, molecular docking, protein-ligand binding, diffusion models, benchmark, bootstrapping, self-training
TL;DR: A new benchmark to test docking generalization and a novel self-training technique to learn without access to data
Abstract: Accurate blind docking has the potential to lead to new biological breakthroughs, but for this promise to be realized, it is critical that docking methods generalize well across the proteome. However, existing benchmarks fail to rigorously assess generalizability. Therefore, we develop DockGen, a new benchmark based on the ligand-binding domains of proteins, and we show that machine learning-based docking models have very weak generalization abilities even when combined with various data augmentation strategies. Instead, we propose Confidence Bootstrapping, a new training paradigm that solely relies on the interaction between a diffusion and a confidence model. Unlike previous self-training methods from other domains, we directly exploit the multi-resolution generation process of diffusion models using rollouts and confidence scores to reduce the generalization gap. We demonstrate that Confidence Bootstrapping significantly improves the ability of ML-based docking methods to dock to unseen protein classes, edging closer to accurate and generalizable blind docking methods.
Submission Number: 37
Loading