Omni-Think: Scaling Multi-Task Learning in LLMs via Reinforcement Learning

Published: 09 Jul 2025, Last Modified: 16 Jul 2025AI4Math@ICML25 PosterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Keywords: Multi-Task Learning, Large Language Model, Reinforcement Learning, Curriculum Learning
TL;DR: We present Omni-Think, a unified multi-task RL framework that combines verifiable and generative rewards to train LLMs across diverse tasks, achieving strong generalization through curriculum-guided optimization.
Abstract: The pursuit of general-purpose artificial intelli- gence demands large language models (LLMs) capable of excelling across diverse tasks, rang- ing from symbolic reasoning to open-ended gen- eration. However, existing post-training meth- ods, such as Supervised Fine-Tuning (SFT) of- ten fall short in multi-task settings, leading to poor generalization and memorization rather than transferable capabilities. In this work, we in- troduce Omni-Think, a unified framework that enhances LLM performance across both struc- tured and open-ended tasks. Our approach inte- grates rule-based verifiable rewards with genera- tive preference signals obtained through LLM-as- a-Judge evaluations, enabling consistent optimiza- tion across heterogeneous task types. To better understand the dynamics of multi-task RL, we ex- plore different task scheduling strategies and find that introducing tasks in a progression from struc- tured to open-ended leads to better generalization and mitigated forgetting. Experiments across four domains reveals that curriculum training improves average relative performance by 5.2 % over joint multi-task RL and by 9.1 % over merging models trained via RL on individual tasks. These findings highlight the value of task-aware sampling and hy- brid supervision in scaling RL-based post-training for general-purpose LLMs.
Submission Number: 118
Loading