Llama-Nemotron: Efficient Reasoning Models

Soumye Singhal; Jiaqi Zeng; Alexander Bukharin; Yian Zhang; Gerald Shen; Ameya Sunil Mahabaleshwarkar; Bilal Kartal; Yoshi Suhara; Akhiad Bercovich; Itay Levy; Izik Golan; Mohammed Dabbah; Ran El-Yaniv; Somshubra Majumdar; Igor Gitman; Evelina Bakhturina; Jimmy J. Zhang; Bor-Yiing Su; Guyue Huang; Izzy Putterman; Mostofa Patwary; Oluwatobi Olabiyi; Olivier Delalleau; Bryan Catanzaro; Boris Ginsburg; Oleksii Kuchaiev; Tugrul Konuk

Llama-Nemotron: Efficient Reasoning Models

Published: 12 Jun 2025, Last Modified: 08 Jul 2025EXAIT@ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Language Modeling

Keywords: Reinforcement learning, LLMs, AI agents, and AI alignment, RLVR

TL;DR: We introduce Llama-Nemotron, a family of open reasoning models that leverage large-scale SFT and reinforcement learning with exploration-driven curriculum to surpass teacher performance on challenging reasoning benchmarks like GPQA

Abstract: We introduce the Llama-Nemotron series: open, heterogeneous reasoning models in three sizes—Nano (8B), Super (49B), and Ultra (253B)—that deliver strong reasoning, inference efficiency, and a permissive license. Our training pipeline combines neural architecture search, knowledge distillation, and a reasoning-focused post-training stage with supervised fine-tuning and large-scale reinforcement learning. Our large-scale RL training leverages exploration-driven curriculum and data filtering strategies to systematically challenge the model with increasingly difficult reasoning tasks, enabling it to discover and refine complex problem-solving chains beyond the capabilities of supervised learning. This approach allows the model to autonomously explore new reasoning strategies and surpass teacher performance on challenging benchmarks. Ultra achieves significantly higher GPQA accuracy and outperforms DeepSeek-R1 and other open models on key reasoning tasks. Llama-Nemotron models are also the first open-source models to support a dynamic reasoning toggle. We open-source all data, models, and code to support open research.

Serve As Reviewer: ~Soumye_Singhal1, ~Alexander_Bukharin1, ~Tugrul_Konuk1

Submission Number: 86

Loading