Keywords: protein–molecule interaction, GPCRs, olfactory receptors, dose–response modeling, EC50 prediction, protein language modelling, graph neural networks
TL;DR: We reframe GPCR activity and EC50 prediction as 1. sampling binary activity labels at a concentration $c$, 2. training a model predicting activity at any $c$ and 3. fitting a dose-response curve on the activity at a range of concentrations.
Abstract: ML models have revolutionised structural biology and significantly advanced drug discovery, yet they struggle with predicting the ligand-induced activity of G-protein coupled receptors (GPCRs). GPCRs are membrane proteins acting as cellular "sensors" which trigger a cascade of intracellular processes upon binding a diverse set of molecules. Human GPCRs account for nearly $30$% of targets of approved drugs, and approximately half of them are olfactory receptors (ORs). Beyond their role in smell perception, ORs are increasingly linked to diseases such as obesity, diabetes, asthma, and cancer. The core interest and difficulty in modelling the molecule-induced response of ORs and GPCRs lie in predicting activity and potency (i.e. half maximal effective concentration, $EC_{50}$). In this paper, we propose a new way of modelling these properties. Instead of direct regression on $EC_{50}$ values, we mimic in vitro dose-response assays by sampling binary activity labels for a protein-molecule pair $(s, m)$ at a molecular concentration $c$. Then we design a novel model that learns the activation probability $P(active|s,m,c)$ at any given $c$. Finally, querying the model across concentrations enables fitting a logistic curve, from which both activity (curve maximum) and $EC_{50}$ (inflection point) are derived. On test sets of $1155$ protein-molecule pairs, our framework improves activity prediction by $10$% over the state-of-the-art. For $EC_{50}$ estimation, it achieves an error of $0.725$ log units, reducing the error by $40$% compared to a regression baseline and surpassing the affinity module of Boltz-2 by 0.35 log units. Notably, our approach effectively identifies novel active scaffolds, demonstrating its potential to replace expensive in vitro primary screening. The proposed framework is protein-agnostic and can be extended to a broad field of drug discovery.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 20463
Loading