Decoding Layer by Layer: Uncovering Hierarchical Reasoning in Language Models

Decoding Layer by Layer: Uncovering Hierarchical Reasoning in Language Models

ICLR 2026 Conference Submission13147 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: hierarchical decoder, hierarchical text classification, emotional support conversation

TL;DR: We design a new type of language model with the format of hierarchical decoder, which performs well on multiple classification and generation tasks.

Abstract: Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by humans' hierarchical reasoning capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts in a streaming manner. Due to limited time and computational resources, we choose to adapt a pretrained language model into this form of hierarchical decoder. Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. By thorough experiments, we validate that these selective intermediate layers could be adapted to speak meaningful and reasonable contents, and this paradigm of hierarchical decoder can obtain state-of-the-art performances on multiple tasks such as hierarchical text classification, classification-guided generation, and hierarchical text generation. HdLM outperforms all baselines on WoS, DBpedia, ESConv, EmpatheticDialogues, AQuA, CommonSenseQA, and several cognitive tests. We also provide a thorough theoretical analysis to validate the convergence and computational savings of our methodology. Our code and model can be found on https://anonymous.4open.science/r/HdLM-0B90/.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 13147

Loading