Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning

Jasper Zhang; Bryan Cheng

Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning

Jasper Zhang, Bryan Cheng

Published: 02 Mar 2026, Last Modified: 06 Apr 2026GEM 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-task learning, gradient conflicts, sample overlap, task affinity estimation, phase transitions, molecular property prediction, benchmark design, graph neural networks, drug discovery, negative transfer, task relationships, representation learning

TL;DR: We explain 7 years of inconsistent MTL results: gradient analysis requires ≥40% sample overlap—benchmarks like MoleculeNet operate at <5%. Phase transition at 30%; r=0.94 across 93 tasks, first principled threshold for gradient-based task discovery.

Abstract: Multi-task learning shows strikingly inconsistent results—sometimes joint training helps substantially, sometimes it actively harms performance—yet the field lacks a principled framework for predicting these outcomes. We identify a fundamental but unstated assumption underlying gradient-based task analysis: tasks must share training instances for gradient conflicts to reveal genuine relationships. When tasks are measured on the same inputs, gradient alignment reflects shared mechanistic structure; when measured on disjoint inputs, any apparent signal conflates task relationships with distributional shift. We discover this sample overlap requirement exhibits a sharp phase transition: below 30% overlap, gradient-task correlations are statistically indistinguishable from noise; above 40%, they reliably recover known biological structure. Comprehensive validation across multiple datasets achieves strong correlations and recovers biological pathway organization. Standard benchmarks systematically violate this requirement—MoleculeNet operates at <5% overlap, TDC at 8–14%—far below the threshold where gradient analysis becomes meaningful. This provides the first principled explanation for seven years of inconsistent MTL results.

Submission Number: 18

Loading