Adapting Auxiliary Losses Using Gradient Similarity

Yunshu Du; Wojciech M. Czarnecki; Siddhant M. Jayakumar; Razvan Pascanu; Balaji Lakshminarayanan

Adapting Auxiliary Losses Using Gradient Similarity

Yunshu Du, Wojciech M. Czarnecki, Siddhant M. Jayakumar, Razvan Pascanu, Balaji Lakshminarayanan

27 Sept 2018 (modified: 22 Jun 2025)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: One approach to deal with the statistical inefficiency of neural networks is to rely on auxiliary losses that help to build useful representations. However, it is not always trivial to know if an auxiliary task will be helpful for the main task and when it could start hurting. We propose to use the cosine similarity between gradients of tasks as an adaptive weight to detect when an auxiliary loss is helpful to the main loss. We show that our approach is guaranteed to converge to critical points of the main task and demonstrate the practical usefulness of the proposed algorithm in a few domains: multi-task supervised learning on subsets of ImageNet, reinforcement learning on gridworld, and reinforcement learning on Atari games.

Keywords: auxiliary losses, transfer learning, task similarity, deep learning, deep reinforcement learning

TL;DR: Auxiliary tasks need to match the main task to improve learning; we propose to use cosine distance between gradients of an unknown auxiliary task to protect from negative interference with learning the main task.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/adapting-auxiliary-losses-using-gradient/code)

15 Replies

Loading