PretrainedBA: Enhancing Compound-Protein Binding Affinity Prediction Accuracy via Pre-training Large-Scale Interaction Information

Sangmin Seo, Seungyeon Choi, Hwanhee Kim, Sanghyun Park

Published: 2024, Last Modified: 06 Feb 2025BIBM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Finding potential drug candidates with high binding affinity for the specific target protein presents an important goal in early drug discovery. Although compound-protein complex structure-based affinity prediction methods have shown promising prediction accuracy, their dependency on high-resolution three-dimensional (3D) complex structure data considerably limits their practical application. Alternatively, many complex-free binding affinity prediction methods have been proposed; however, there is still room for improvement to compensate effectively for the lack of binding information. In particular, the interpretability of compound-protein interactions is a significant challenge that needs to be addressed. To alleviate the limitations of current complex-free models, we propose PretrainedBA, a predictive model that uses pre-training strategies on large-scale datasets, including interaction data. PretrainedBA pre-trains the interdependent relationships between compounds and proteins, rather than the independent pre-training of compounds and proteins utilized in existing studies. PretrainedBA consists of six modules and is designed to effectively model compound-protein interactions within identified binding pockets. Comparisons with state-of-the-art complex-free models on seven external benchmark datasets demonstrate that this pre-training strategy improves binding affinity prediction accuracy. In particular, the outstanding interpretive power of compound-protein interaction mechanisms compared with the previous method further emphasizes the value of PretrainedBA. Real-world application evaluation using the Database of Useful Decoys-Enhanced (DUD-E) dataset confirmed PretrainedBA’s practical applicability, demonstrating its utility in drug discovery.