Abstract: Predicting molecular properties is crucial across scientific and industrial domains such as drug discovery and material science. Contrastive learning has gained prominence as an effective method. However, routine strategies often overlook substructure information and run the risk of guiding the model to learn incorrect knowledge due to molecular property changes during augmentation. To address these issues, we propose a novel molecular contrastive learning model named Molecular Contrastive Learning with Learnable Weighted Substructures (MolCLW). Our model aims to enhance the learning of molecular substructures by comparing molecular representations generated at the molecular-level and substructure-level. Meanwhile, a transformer encoder module is dedicated to learning the weight scores of substructures, enabling the model to adjust the similarity level between positive pairs based on the varying masked substructures. In comparison to baselines, MolCLW demonstrates an averaged 2.2% improvement in ROC-AUC on 6 classification benchmarks and an averaged 4.3% decrease in error on 5 regression benchmarks. Furthermore, during the finetuning process, the model can attend to the substructure relevant to the downstream task by adjusting the weights of different substructures. The visualization results establish a link between downstream targets and substructure function, providing valuable insights and strong support for drug discovery, chemical reactions, and research.
Loading