Keywords: molecular docking, automated algorithm selection, graph neural networks
TL;DR: This paper proposes a GNN-based algorithm selection model to improve the accuracy and efficiency of molecular docking by selecting the most suitable docking methods for specific scenarios.
Abstract: Molecular docking is a major element in drug discovery and design. It enables the prediction of ligand-protein interactions by simulating the binding of small molecules to proteins. Despite the availability of numerous docking algorithms, there is no single algorithm consistently outperforms the others across a diverse set of docking scenarios. This paper introduces GNNAS-Dock, a novel Graph Neural Network (GNN)-based automated algorithm selection system for molecular docking in blind docking situations. GNNs are accommodated to process the complex structural data of both ligands and proteins. They benefit from the inherent graph-like properties to predict the performance of various docking algorithms under different conditions. The present study pursues two main objectives: 1) predict the performance of each candidate docking algorithm, in terms of Root Mean Square Deviation (RMSD), thereby identifying the most accurate method for specific scenarios; and 2) choose the best computationally efficient docking algorithm for each docking case, aiming to reduce the time required for docking while maintaining high accuracy. We validate our approach on PDBBind 2020 refined set, which contains about 5,300 pairs of protein-ligand complexes. Our strategy is performed across a portfolio of 6 different state-of-the-art docking algorithms. To be specific, the candidate algorithms are DiffDock, DSDP, TankBind, GNINA, SMINA, Qvina-W. We additionally combine p2rank with GNINA, SMINA and Qvina-W for docking site prediction. Therefore, there are totally 9 different algorithms for selection. Our algorithm selection model achieves a mean RMSD of approximately 1.74 Å, significantly improving upon the top performing docking algorithm (DiffDock), which has a mean RMSD of 2.95 Å. Moreover, when making selection in consideration of computational efficiency, our model demonstrates a success rate of 79.73% in achieving an RMSD below the 2 Å threshold, with a mean RMSD value of 2.75 Å and an average processing time of about 29.05 seconds per instance. In contrast, the remaining docking algorithms like TankBind, though faster with a processing time of merely 0.03 seconds per instance, only achieve an RMSD below the 2 Å threshold in less than 60% of cases. These findings demonstrate the capability of GNN-based algorithm selection to significantly enhance docking performance while effectively reducing the computational time required, balancing efficiency with precision in molecular docking.
Submission Number: 61
Loading