Llama-Nemotron: Efficient Reasoning Models

Published: 12 Jun 2025, Last Modified: 21 Jun 2025EXAIT@ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Language Modeling
Keywords: Reinforcement learning, LLMs, AI agents, and AI alignment, RLVR
TL;DR: We introduce Llama-Nemotron, a family of open reasoning models that leverage large-scale SFT and reinforcement learning with exploration-driven curriculum to surpass teacher performance on challenging reasoning benchmarks like GPQA
Abstract: We introduce the Llama-Nemotron series: open, heterogeneous reasoning models in three sizes—Nano (8B), Super (49B), and Ultra (253B)—that deliver strong reasoning, inference efficiency, and a permissive license. Our training pipeline combines neural architecture search, knowledge distillation, and a reasoning-focused post-training stage with supervised fine-tuning and large-scale reinforcement learning. Our large-scale RL training leverages exploration-driven curriculum and data filtering strategies to systematically challenge the model with increasingly difficult reasoning tasks, enabling it to discover and refine complex problem-solving chains beyond the capabilities of supervised learning. This approach allows the model to autonomously explore new reasoning strategies and surpass teacher performance on challenging benchmarks. Ultra achieves significantly higher GPQA accuracy and outperforms DeepSeek-R1 and other open models on key reasoning tasks. Llama-Nemotron models are also the first open-source models to support a dynamic reasoning toggle. We open-source all data, models, and code to support open research.
Serve As Reviewer: ~Soumye_Singhal1, ~Alexander_Bukharin1, ~Tugrul_Konuk1
Submission Number: 86
Loading