Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning

Bryan Cheng; Jasper Zhang

Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning

Bryan Cheng, Jasper Zhang

Published: 03 Mar 2026, Last Modified: 26 Apr 2026ICLR 2026 Workshop FM4Science PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: science-first benchmark design, molecular foundation models, principled multi-task learning, scientific data requirements, real-world reproducibility, MoleculeNet limitations, quantum chemistry validation, drug discovery benchmarks, task relationship discovery, biological pathway recovery, scientific ML rigor, sample alignment principles, negative transfer in science, TDC benchmark analysis, foundation model training data

TL;DR: Science-first design principle for molecular foundation models: tasks require ≥40% sample overlap for valid gradient analysis—standard benchmarks violate this, explaining 7 years of inconsistent results and enabling principled task grouping (+4%).

Abstract: Multi-task learning shows strikingly inconsistent results—sometimes joint training helps substantially, sometimes it actively harms performance—yet the field lacks a principled framework for predicting these outcomes. We identify a fundamental but unstated assumption underlying gradient-based task analysis: tasks must share training instances for gradient conflicts to reveal genuine relationships. When tasks are measured on the same inputs, gradient alignment reflects shared mechanistic structure; when measured on disjoint inputs, any apparent signal conflates task relationships with distributional shift. We discover this sample overlap requirement exhibits a sharp phase transition: below 30% overlap, gradient-task correlations are statistically indistinguishable from noise; above 40%, they reliably recover known biological structure. Comprehensive validation across multiple datasets achieves strong correlations and recovers biological pathway organization. Standard benchmarks systematically violate this requirement—MoleculeNet operates at <5% overlap, TDC at 8–14%—far below the threshold where gradient analysis becomes meaningful. This provides the first principled explanation for seven years of inconsistent MTL results.

Submission Number: 23

Loading