Abstract: Despite recent advancements in offline multi-task reinforcement learning (MTRL) have harnessed the powerful capabilities of the Transformer architecture, most approaches focus on a limited number of tasks, with scaling to extremely massive tasks remaining a formidable challenge. In this paper, we first revisit the key impact of task numbers on current MTRL method, and further reveal that naively expanding the parameters proves insufficient to counteract the performance degradation as the number of tasks escalates. Building upon these insights, we propose M3DT, a novel mixture-of-experts (MoE) framework that tackles task scalability by further unlocking the model’s parameter scalability. Specifically, we enhance both the architecture and the optimization of the agent, where we strengthen the Decision Transformer (DT) backbone with MoE to reduce task load on parameter subsets, and introduce a three-stage training mechanism to facilitate efficient training with optimal performance. Experimental results show that, by increasing the number of experts, M3DT not only consistently enhances its performance as model expansion on the fixed task numbers, but also exhibits remarkable task scalability, successfully extending to 160 tasks with superior performance.
Lay Summary: Sequence modeling–based offline reinforcement learning holds great promise, but current approaches struggle to scale to large-scale multi-task reinforcement learning (MTRL). In this work, we first revisit the limitations of such methods in terms of scalability with respect to both the number of tasks and model parameters. To address these challenges, we propose M3DT, a novel framework that enables task scalability by unlocking parameter scalability. M3DT leverages a mixture-of-experts (MoE) architecture to achieve efficient parameter decoupling and expansion, employs task grouping to reduce the number of tasks handled by each expert and ease the learning burden, and adopts a three-stage training paradigm for targeted optimization. Experimental results demonstrate that our method achieves strong performance, paving the way for future research in sequence modeling–based offline RL and MTRL.
Link To Code: https://github.com/KongYilun/M3DT
Primary Area: Reinforcement Learning->Batch/Offline
Keywords: multi-task RL, offline RL, Decision Transformer, Mixture-of-Experts
Submission Number: 11746
Loading