The Benefits in Shallow: Merge Decoding Across Large Language Model Layers

Yuechi Zhou, Chuyue Zhou, Wenjing Xie, Xinrui Wang, Jiuchang Chen, Zhenghua Ni, Juntao Li

Published: 01 Jan 2024, Last Modified: 15 May 2025NLPCC (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Large language models (LLMs) have become foundational to numerous natural language processing tasks; however, decoding coherent and contextually relevant text remains a complex challenge. In open-ended generation, maximizing probability is often not the appropriate objective, as with sampling methods, the continuation tends to be incoherent and repetitive in various degrees. We propose Merge Decoding, merging information in the shallow layer, such as sequential information, with the final task-specific layer, thereby generating coherent and rich text. MD works across three scales of the LLaMA family(7B, 13B, 30B), achieving higher quality text in open-ended text generation (WikiText, WikiNews, BookCorpus) and enhancing reasoning capabilities in downstream tasks (Gsm8k, StrategyQA) https://github.com/YcChou/MergeDecoding.