Gradient Coordination for Quantifying and Maximizing Knowledge Transference in Multi-Task Learning

Published: 01 Jan 2023, Last Modified: 05 Mar 2025SIGIR 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multi-task learning (MTL) has been widely applied in online advertising systems. To address the negative transfer issue, recent optimization methods emphasized the gradient alignment of directions or magnitudes. Since prior studies have proven that the shared modules contain both general and specific knowledge, overemphasizing on gradient alignment may crowd out task-specific knowledge. In this paper, we propose a transference-driven approach CoGrad that adaptively maximizes knowledge transference via Coordinated Gradient modification. We explicitly quantify the transference as loss reduction from one task to another, and optimize it to derive an auxiliary gradient. By incorporating this gradient into original task gradients, the model automatically maximizes inter-task transfer and minimizes individual losses, leading to general and specific knowledge harmonization. Besides, we introduce an efficient approximation of the Hessian matrix, making CoGrad computationally efficient. Both offline and online experiments verify that CoGrad significantly outperforms previous methods.
Loading