Stutter Makes Smarter: Learning Self-Improvement for Large Language Models

Pei-Chen Ho; Meng-Hsi Chen; Alberto Bernacchia; Philipp Ennen; Yen-Chen Wu; Da-shan Shiu

Stutter Makes Smarter: Learning Self-Improvement for Large Language Models

Pei-Chen Ho, Meng-Hsi Chen, Alberto Bernacchia, Philipp Ennen, Yen-Chen Wu, Da-shan Shiu

Published: 30 Oct 2024, Last Modified: 13 Dec 2024LanGame PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Self-improvement for LLM

Abstract: Large language models (LLMs) excel in generating coherent text but are limited by their large parameters and high memory requirements. Recent studies suggest that dynamically adjusting inference operations can enhance performance without significantly increasing model size. We introduce the stutter mechanism, which enables self-improvement by selectively applying additional layers to challenging tokens, mimicking a human stutter to allocate more computational effort where needed. Our experiments with Pythia models show that the stutter mechanism consistently improves performance across benchmarks. Notably, the Pythia-410M-stutter model outperforms the larger Pythia-1B model on WinoGrande and WSC. Additionally, our method is data-efficient, requiring less than 1% of the pretraining data for additional training. These results demonstrate the stutter mechanism’s potential to enhance LLMs’ efficiency and performance in real-world applications.

Submission Number: 24

Loading