When Does Context Help? A Systematic Study of Target-Conditional Molecular Property Prediction

Bryan Cheng; Jasper Zhang

When Does Context Help? A Systematic Study of Target-Conditional Molecular Property Prediction

Bryan Cheng, Jasper Zhang

Published: 03 Mar 2026, Last Modified: 26 Apr 2026ICLR 2026 Workshop FM4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: foundation models for molecular design, molecular property prediction, context-conditional learning, FiLM modulation, drug discovery, graph neural networks, multi-task transfer learning, scientific benchmarking, temporal generalization, distribution shift, target-aware representations, data-efficient learning, virtual screening, hierarchical conditioning, real-world deployment

TL;DR: Definitive study revealing when context conditioning enhances molecular foundation models. FiLM delivers 24.2pp gains over alternatives; unlocks 2.9× breakthrough on data-scarce targets—critical guidance for real-world scientific deployment.

Abstract: We present the first systematic study of when target context helps molecular property prediction, evaluating context conditioning across 10 diverse protein families, 4 fusion architectures, data regimes spanning 67–9,409 training compounds, and both temporal and random evaluation splits. Using NESTDRUG, a FiLM-based nested-learning architecture that conditions molecular representations on target identity, we characterize both success and failure modes with three principal findings. First, fusion architecture dominates: FiLM outperforms concatenation by 24.2 percentage points and additive conditioning by 8.6 pp; how you incorporate context matters more than whether you include it. Second, context enables otherwise impossible predictions: on data-scarce CYP3A4 (67 training compounds), multi-task transfer achieves 0.686 AUC where per-target Random Forest collapses to 0.238, demonstrating that context conditioning unlocks viable prediction for novel targets where traditional approaches fail entirely. Third, context can systematically hurt: distribution mismatch causes 10.2 pp degradation on BACE1; few-shot adaptation consistently underperforms zero-shot. Beyond methodology, we expose fundamental flaws in standard benchmarking—1-nearest-neighbor Tanimoto achieves 0.991 AUC on DUD-E without any learning, and 50% of actives leak from training data, rendering absolute performance metrics meaningless. Our temporal split evaluation (train ≤2020, test 2021–2024) achieves stable 0.843 AUC with no degradation, providing the first rigorous evidence that context-conditional molecular representations generalize to future chemical space. These findings resolve a long-standing ambiguity in the field and establish clear decision boundaries for when context conditioning provides genuine value in drug discovery pipelines.

Submission Number: 18

Loading