Interpreting GFlowNets for Drug Discovery: Extracting Actionable Insights for Medicinal Chemistry

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Interpretability, Generative Flow Networks, GFlowNets, Structured generative models, Molecular design, Drug discovery, Reaction graph modeling, Gradient-based saliency, Counterfactual analysis, Sparse autoencoders, Concept attribution, Latent representations, Trustworthy AI, Scientific machine learning, Medicinal chemistry
Abstract: Deep-generative models are increasingly applied to molecular design in drug discovery, where they explore vast chemical spaces while respecting synthesizability constraints. SynFlowNet [1], a hierarchical Generative Flow Network (GFlowNet), addresses this challenge by constructing molecules through sequential reaction templates and building blocks. Yet, the internal representations and decision policies of such models remain opaque, limiting interpretability and trust. From an ML perspective, hierarchical GFlowNets are an emerging class of structured generative models, and their interpretability remains largely unexplored. Bridging this gap advances transparency in generative ML while creating methods that extend beyond the chemistry domain. We introduce a unified interpretability framework for reaction graph GFlowNets that adapts modern ML analysis tools to scientific generative models. Our approach integrates two complementary perspectives. First, gradient-based saliency with counterfactual analysis: we compute gradients of action log-probabilities and map them into atom-level heatmaps. To move beyond correlation, we perturb chemical motifs with SMARTS-based masking and quantify probability shifts, yielding both attribution maps and causal evidence for which substructures drive decisions. Second, concept attribution in latent space: we train sparse autoencoders (SAEs) and linear probes on SynFlowNet embeddings to uncover interpretable factors and motifs encoded by the model. We find that SynFlowNet, when trained with QED (drug-likeness) as the reward, does not encode it in a single latent dimension. Instead, sparse autoencoders disentangle QED into interpretable axes such as size, polarity, and lipophilicity, which are more linearly predictable than QED itself. Linear probes accurately detect chemically meaningful motifs (e.g., functional groups, rings, halogens), showing that domain concepts are directly recoverable. Counterfactual analyses further improve QED optimization by identifying and altering reward-critical substructures. By combining saliency, counterfactuals, and concept attribution, our framework offers the first toolkit for interpreting GFlowNets in molecular design. This not only demonstrates how interpretability methods from vision and language can be extended to structured generative models in ML, but also provides actionable insights for medicinal chemists, helping bridge model behavior to chemical reasoning and accelerating drug discovery . [1] Miruna Cretu et al., SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints, arXiv preprint arXiv:2024.
Submission Number: 177
Loading