Keywords: Multi-Task Learning, Large Language Model, Reinforcement Learning, Curriculum Learning
TL;DR: We present Omni-Think, a unified multi-task RL framework that combines verifiable and generative rewards to train LLMs across diverse tasks, achieving strong generalization through curriculum-guided optimization.
Abstract: The pursuit of general-purpose artificial intelli-
gence demands large language models (LLMs)
capable of excelling across diverse tasks, rang-
ing from symbolic reasoning to open-ended gen-
eration. However, existing post-training meth-
ods, such as Supervised Fine-Tuning (SFT) of-
ten fall short in multi-task settings, leading to
poor generalization and memorization rather than
transferable capabilities. In this work, we in-
troduce Omni-Think, a unified framework that
enhances LLM performance across both struc-
tured and open-ended tasks. Our approach inte-
grates rule-based verifiable rewards with genera-
tive preference signals obtained through LLM-as-
a-Judge evaluations, enabling consistent optimiza-
tion across heterogeneous task types. To better
understand the dynamics of multi-task RL, we ex-
plore different task scheduling strategies and find
that introducing tasks in a progression from struc-
tured to open-ended leads to better generalization
and mitigated forgetting. Experiments across four
domains reveals that curriculum training improves
average relative performance by 5.2 % over joint
multi-task RL and by 9.1 % over merging models
trained via RL on individual tasks. These findings
highlight the value of task-aware sampling and hy-
brid supervision in scaling RL-based post-training
for general-purpose LLMs.
Submission Number: 118
Loading