Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning

Published: 02 Mar 2026, Last Modified: 08 Apr 2026AI4Mat-ICLR-2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: molecular property prediction, multi-task learning for molecules, MoleculeNet benchmark limitations, QM9 quantum chemistry, materials screening efficiency, molecular assay alignment, task transfer in chemistry, drug discovery benchmarks, gradient-based task selection, negative transfer avoidance, compound library overlap, molecular ML reproducibility, accelerated property screening, TDC benchmark analysis, cheminformatics data quality
TL;DR: Explains 7 years of inconsistent molecular MTL: tasks need ≥40% compound overlap for valid gradient analysis—MoleculeNet (<5%) and TDC (8-14%) violate this, enabling 3-4% improved screening via principled task grouping on QM9 and Tox21.
Abstract: Multi-task learning shows strikingly inconsistent results—sometimes joint training helps substantially, sometimes it actively harms performance—yet the field lacks a principled framework for predicting these outcomes. We identify a fundamental but unstated assumption underlying gradient-based task analysis: tasks must share training instances for gradient conflicts to reveal genuine relationships. When tasks are measured on the same inputs, gradient alignment reflects shared mechanistic structure; when measured on disjoint inputs, any apparent signal conflates task relationships with distributional shift. We discover this sample overlap requirement exhibits a sharp phase transition: below 30% overlap, gradient-task correlations are statistically indistinguishable from noise; above 40%, they reliably recover known biological structure. Comprehensive validation across multiple datasets achieves strong correlations and recovers biological pathway organization. Standard benchmarks systematically violate this requirement—MoleculeNet operates at <5% overlap, TDC at 8–14%—far below the threshold where gradient analysis becomes meaningful. This provides the first principled explanation for seven years of inconsistent MTL results.
Submission Track: Full Paper
Submission Category: AI-Guided Design
Submission Number: 54
Loading