Gradient Surgery for Multi-Task Learning

Tianhe Yu; Saurabh Kumar; Abhishek Gupta; Karol Hausman; Sergey Levine; Chelsea Finn

Gradient Surgery for Multi-Task Learning

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Karol Hausman, Sergey Levine, Chelsea Finn

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: multi-task learning, deep learning

TL;DR: We develop a simple and general approach for avoiding interference between gradients from different tasks, which improves the performance of multi-task learning in both the supervised and reinforcement learning domains.

Abstract: While deep learning and deep reinforcement learning systems have demonstrated impressive results in domains such as image classification, game playing, and robotic control, data efficiency remains a major challenge, particularly as these algorithms learn individual tasks from scratch. Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks to enable more efficient learning. However, the multi-task setting presents a number of optimization challenges, making it difficult to realize large efficiency gains compared to learning tasks independently. The reasons why multi-task learning is so challenging compared to single task learning are not fully understood. Motivated by the insight that gradient interference causes optimization challenges, we develop a simple and general approach for avoiding interference between gradients from different tasks, by altering the gradients through a technique we refer to as “gradient surgery”. We propose a form of gradient surgery that projects the gradient of a task onto the normal plane of the gradient of any other task that has a conflicting gradient. On a series of challenging multi-task supervised and multi-task reinforcement learning problems, we find that this approach leads to substantial gains in efficiency and performance. Further, it can be effectively combined with previously-proposed multi-task architectures for enhanced performance in a model-agnostic way.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/gradient-surgery-for-multi-task-learning/code)

Original Pdf: pdf

11 Replies

Loading