TaskForce: Cooperative Multi-agent Reinforcement Learning for Multi-task Optimization

Wonhyeok Choi; Kyumin Hwang; Jihun Park; Kyoungmin Lee; Seunghun Lee; Jaeyeul Kim; Minwoo Choi; Sunghoon Im

TaskForce: Cooperative Multi-agent Reinforcement Learning for Multi-task Optimization

Wonhyeok Choi, Kyumin Hwang, Jihun Park, Kyoungmin Lee, Seunghun Lee, Jaeyeul Kim, Minwoo Choi, Sunghoon Im

18 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-task learning, Multi-objective optimization, Cooperative multi-agent reinforcement learning

TL;DR: We propose a TaskForce, the multi-task optimization framework leveraging cooperative multi-agent reinforcement learning.

Abstract: Multi-task learning (MTL) involves the simultaneous optimization of multiple task-specific losses, often leading to gradient conflicts and scale imbalances that result in negative transfer. While existing multi-task optimization methods attempt to mitigate these challenges, they either lack the stochasticity needed to escape poor local minima or fail to explicitly address conflicts at the gradient level. In this work, we propose TaskForce, a novel multi-task optimization framework incorporating cooperative multi-agent reinforcement learning (MARL), where agents learn to find an effective joint optimization strategy based on their respective task gradients and losses. To keep the optimization process compact yet informative, agents observe a summary of the training dynamics that consists of the gradient Gram matrix—capturing both gradient magnitudes and pairwise alignments—and task loss values. Each agent then predicts the balancing parameters that determine the weight of their contribution to the final gradient update. Crucially, we design a hybrid reward function that incorporates both gradient-based signals and loss improvement dynamics, enabling agents to effectively resolve gradient conflicts and avoid poor convergence by considering both direct gradient information and the resulting impact on loss reduction. TaskForce achieves consistent improvements over state-of-the-art MTL baselines on NYU-v2, Cityscapes, and QM9, demonstrating the promise of cooperative MARL in complex multi-task scenarios.

Primary Area: optimization

Submission Number: 12548

Loading