Multimodal out-of-distribution individual uncertainty quantification enhances binding affinity prediction for polypharmacology
Abstract: Polypharmacology, a single drug that targets multiple proteins, holds promise for addressing unmet medical needs. Achieving accurate, reliable and scalable predictions of protein–ligand binding affinity across multiple proteins is crucial to realizing the potential of polypharmacology. Machine learning offers a powerful tool for multitarget binding affinity prediction. However, three major challenges remain: generalizing predictions to out-of-distribution compounds that are structurally different from those in the training data; quantifying the uncertainty of predictions in out-of-distribution scenarios where the assumption underlying existing methods does not hold; and scaling to billions of compounds, which remains unattainable for current structure-based methods. Here, to overcome these challenges, we propose a model-agnostic anomaly detection-based individual uncertainty quantification method: embedding Mahalanobis Outlier Scoring and Anomaly Identification via Clustering (eMOSAIC). eMOSAIC features the divergence between the multimodal representations of known cases and unseen instances and quantifies individual prediction uncertainty on a compound-by-compound basis. We integrate eMOSAIC with a multimodal deep neural network for multitarget ligand binding affinity predictions, leveraging a structure-informed large protein language model. Comprehensive validation in out-of-distribution settings demonstrates that eMOSAIC significantly outperforms state-of-the-art sequence-based and structure-based methods as well as existing uncertainty quantification approaches. These findings underscore eMOSAIC’s potential to advance real-world polypharmacology and other applications that require robust predictions and scalable solutions.
Loading