TRM-Planner: Offline Target Planning and Distillation for Tiny Recursive Models

ACL ARR 2026 January Submission7342 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Knowledge distillation, Iterative refinement, Reasoning model, Offline caching, Planning
Abstract: Tiny Recursive Models (TRMs) perform iterative reasoning with an Adaptive Computation Time (ACT)-style loop, but their supervised training targets can be brittle, and their halting behavior can be difficult to tune. We introduce TRM-Planner, a two-stage teacher-cache distillation recipe that shifts compute to an offline teacher-cache stage. A frozen TRM checkpoint is unrolled for multiple refinement steps and stochastic rollouts; for each instance, we cache a small set of teacher entries (tokens, logits, step index, and quality metadata). A student TRM is then trained with the standard TRM objective plus a distillation loss computed from cached entries. Across Sudoku-Extreme and ARC-AGI-1/2, TRM-Planner shows an improvement over our reproduced TRM baseline while leaving student-time inference unchanged. On ARC1/ARC2 with 7M parameters, the two-attempt accuracy (pass@2) increases from 43.1% to 48.1% and 6.7% to 9.2%, respectively.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: distillation, data augmentation, data-efficient training, parameter-efficient-training
Languages Studied: non-linguistic grid/puzzle token sequences
Submission Number: 7342
Loading