Boosting Drug-Target Affinity Prediction from Nearest Neighbors

Qizhi Pei; Lijun Wu; Jinhua Zhu; Zhenyu He; Yingce Xia; Shufang Xie; Tao Qin; Rui Yan; Tie-Yan Liu

Boosting Drug-Target Affinity Prediction from Nearest Neighbors

Qizhi Pei, Lijun Wu, Jinhua Zhu, Zhenyu He, Yingce Xia, Shufang Xie, Tao Qin, Rui Yan, Tie-Yan Liu

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Abstract: Precisely predicting Drug-Target binding Affinity (DTA) is essential for drug discovery. Recently, deep learning methods have been popular with DTA prediction. However, the prediction accuracy is still far from satisfaction. In this work, inspired by the recent success of retrieval methods, we propose $k$NN-DTA, a non-parametric embedding-based retrieval method adopted on a pre-trained DTA prediction model, which can extend the power of the neural DTA model with no or negligible cost. Compared to traditional chemical similarity retrieval, our embedding-based retrieval shows extremely high efficiency. Different from existing methods, we introduce two neighbor aggregation ways from both embedding space and label space that are integrated in a unified framework. Specifically, we propose a \emph{label aggregation} with \emph{pair-wise retrieval} and a \emph{representation aggregation} with \emph{point-wise retrieval} of the nearest neighbors. This method executes in the inference phase and can efficiently boost the DTA prediction performance with no training cost. In addition, we propose an extension, Ada-$k$NN-DTA, an instance-wise and adaptive aggregation with lightweight learning. Results on four benchmark datasets show that $k$NN-DTA brings significant improvements, outperforming previous state-of-the-art (SOTA) results, e.g, on BindingDB IC$_{50}$ and $K_i$ testbeds, $k$NN-DTA obtains new records of RMSE scores $\bf{0.687}$ and $\bf{0.748}$ with both $\bf{4}$ point improvement. The extended Ada-$k$NN-DTA can further improve the performance, e.g., another $\bf{1}$ point gain on BindingDB. These results strongly prove the effectiveness and efficiency of our method. Results on other settings and comprehensive studies/analyses also show the great potential of our $k$NN-DTA approach.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Machine Learning for Sciences (eg biology, physics, health sciences, social sciences, climate/sustainability )

16 Replies

Loading