MolCoMA: Complementary Masking Strategy for Promoting Atom-Level Multi-Modal Molecular Representation

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-modal Fusion, Molecular Pretraining, Molecular Representation Learning
TL;DR: Complementary masking strategy enhances atom-level multi-modal molecular representation
Abstract: Molecular representation learning, which captures the fundamental characteristics of chemical compounds, is crucial for AI-driven drug discovery. Methodologies exist that integrate various modalities (e.g., 2D topology and 3D geometry) and develop robust representations. However, current multi-modal fusion strategies either align embedding space through independent models separately, thereby overlooking complementary information, or bridge modalities at a coarse-grained level, failing to capture inherent correlation. To facilitate fine-grained interactions of intrinsic features across modalities, this study presents MolCoMA, an innovative pretraining framework for Molecular representation, employing a unified encoder that leverages Complementary Masking mechanism. Specifically, we first employ two distinct encoders to capture the unique characteristics and structures inherent in different modalities. We then utilize a unified encoder accompanied by a customized complementary masking strategy to seamlessly integrate information, mitigating overlap and similarity between 2D and 3D representations. Finally, we incorporate a cross-modal reconstruction module to enhance fine-grained interactions at the atomic level. Extensive experiments demonstrate that our model outperforms existing molecular pretraining methods across both 2D and 3D benchmarks. This finding underscores the effectiveness of our approach to fusing information between modalities.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10319
Loading