SE(3)-Invariant Multiparameter Persistent Homology for Chiral-Sensitive Molecular Property Prediction
Keywords: Drug discovery, molecular property prediction, topology, persistent homology
Abstract: In this study, we present a novel computational method for generating molecular fingerprints using multiparameter persistent homology (MPPH). This technique holds considerable significance for key areas such as drug discovery and materials science, where precise molecular property prediction is vital. By integrating SE(3)-invariance with Vietoris-Rips persistent homology, we effectively capture the three-dimensional representations of molecular chirality. Chirality, an intrinsic feature of stereochemistry, is dictated by the spatial orientation of atoms within a molecule, defining its unique 3D configuration. This non-superimposable mirror image property directly influences the molecular interactions, thereby serving as an essential factor in molecular property prediction. We explore the underlying topologies and patterns in molecular structures by applying Vietoris-Rips persistent homology across varying scales and parameters such as atomic weight, partial charge, bond type, and chirality. Our method's efficacy can be further improved by incorporating additional parameters such as aromaticity, orbital hybridization, bond polarity, conjugated systems, as well as bond and torsion angles. Additionally, we leverage Stochastic Gradient Langevin Boosting (SGLB) in a Bayesian ensemble of Gradient Boosting Decision Trees (GBDT) to obtain aleatoric and epistemic uncertainty estimates for gradient boosting models. Using these uncertainty estimates, we prioritize high-uncertainty samples for active learning and model fine-tuning, benefiting scenarios where data labeling is costly or time consuming. Our approach offers unique insights into molecular structure, distinguishing it from traditional single-parameter or single-scale analyses. When compared to conventional graph neural networks (GNNs) which usually suffer from oversmoothing and oversquashing, MPPH provides a more comprehensive and interpretable characterization of molecular data topology. We substantiate our approach with theoretical stability guarantees and demonstrate its superior performance over existing state-of-the-art methods in predicting molecular properties through extensive evaluations on the MoleculeNet benchmark datasets.
Submission Track: Original Research
Submission Number: 23
Loading