How to Train Your HRM

Sam Olesker-Taylor; Erika Aranas; Michael Arthur Leopold Pearce; Luke Hudlass-Galley

How to Train Your HRM

Sam Olesker-Taylor, Erika Aranas, Michael Arthur Leopold Pearce, Luke Hudlass-Galley

Published: 02 Mar 2026, Last Modified: 06 Apr 2026LIT Workshop @ ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Track: tiny / short paper (up to 5 pages)

Keywords: ARC Challenge, test-time training, hierarchical reasoning model, pretraining

TL;DR: We investigate the training curriculums for HRMs including an offline pre-training phase on the available training tasks, online fine-tuning with evaluation tasks, and per-task overfitting

Abstract: Hierarchical Reasoning Models (HRMs) are a recently proposed model architecture for solving complex reasoning tasks such as the Abstract and Reasoning Corpus (ARC-AGI) challenge: the objective is to learn an underlying transformation, demonstrated by example input–output pairs. The HRM learns transformations via supervised learning on the demonstration pairs. Each task involves an entirely new transformation, necessitating test-time training on the evaluation tasks. We investigate training curricula for HRMs to compensate for limited test-time compute, focused on three stages: offline pre-training on available training data; test-time fine-tuning on evaluation tasks; test-time, per-task `overfitting', in which a specialized model is trained for each task. Our results suggest that pre-training can offer early gains, which may not persist, and that fine-tuning on all tasks (training and evaluation) is optimal. The majority of test-time compute should be spent on fine-tuning, rather than overfitting---typically 2:1 or more.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 62

Loading