TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation

TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation

ACL ARR 2026 January Submission9373 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Branch-Merge Distillation, Domain-specific Supervised Fine-Tuning, Model Merging

Abstract: It is beneficial but challenging to reduce the size of Large Language Models (LLMs) while maintaining their performance. Existing methods, such as naive model distillation, often fail to achieve high accuracy. To address this limitation, we introduce our Branch-Merge distillation approach: First, domain-specific knowledge from a large teacher model is selectively distilled into different student expert models; then, we merge these student experts in order to build a generalized model with cross-domain knowledge. With our distillation approach, we create TinyR1-32B-Preview, which outperforms the original student across multiple benchmarks, including Mathematics (+5.5), Coding (+4.4) and Science (+2.9), and achieves comparable performance to DeepSeek-R1 on AIME 2024. Our Branch-Merge distillation provides a novel solution for creating smaller, high-performing LLMs with reduced computational cost and time.

Paper Type: Short

Research Area: LLM Efficiency

Research Area Keywords: distillation, parameter-efficient-training, LLM Efficiency

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 9373

Loading