Rotograd: Dynamic Gradient Homogenization for Multitask Learning

Adrián Javaloy; Isabel Valera

Rotograd: Dynamic Gradient Homogenization for Multitask Learning

Adrián Javaloy, Isabel Valera

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: multitask learning, deep learning, gradnorm

Abstract: GradNorm (Chen et al., 2018) is a broadly used gradient-based approach for training multitask networks, where different tasks share, and thus compete during learning, for the network parameters. GradNorm eases the fitting of all individual tasks by dynamically equalizing the contribution of each task to the overall gradient magnitude. However, it does not prevent the individual tasks’ gradients from conflicting, i.e., pointing towards opposite directions, and thus resulting in a poor multitask performance. In this work we propose Rotograd, an extension to GradNorm that addresses this problem by dynamically homogenizing not only the gradient magnitudes but also their directions across tasks. For this purpose,Rotograd adds a layer of task-specific rotation matrices that aligns all the task gradients. Importantly, we then analyze Rotograd (and its predecessor) through the lens of game theory, providing theoretical guarantees on the algorithm stability and convergence. Finally, our experiments on several real-world datasets and network architectures show that Rotograd outperforms previous approaches for multitask learning.

One-sentence Summary: Rotograd is a gradient based multitask learning approach that dynamically homogenizes the gradient magnitudes and directions across tasks.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/rotograd-dynamic-gradient-homogenization-for/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=zk4cTjiIbf

10 Replies

Loading