Abstract: Protein-protein interactions (PPIs) are crucial for various cellular activities and disease development, and modulating PPIs using small molecule inhibitors (PPIIs) has gradually become a promising therapeutic strategy. Recently, researchers have proposed several machine learning methods to screen PPIIs, but most of the works focused on unimodal representations of molecules or combining multimodal features in a naive splicing manner. Meanwhile, current research progress is being slowed by the lack of large-scale PPII datasets. To address these issues, we propose MCLPPII, a unified multimodal contrastive learning framework for PPII prediction. MCLPPII extracts comprehensive molecular information from four modalities and effectively combines them through an adaptive feature fusion method. Furthermore, we propose a three-stage training strategy to enhance the PPII prediction capability of MCLPPII by leveraging self-supervised pre-training on a large unlabeled dataset. We evaluate MCLPPII on nine PPI targets and two downstream tasks, including the PPI inhibitor identification task and potency prediction task. Experimental results show that MCLPPII achieves competitive performance. The source code and datasets are freely available at https://github.com/1zzt/MCLPPII.
Loading