Guiding Language Model Reasoning with Planning Tokens

Xinyi Wang; Lucas Caccia; Oleksiy Ostapenko; Xingdi Yuan; William Yang Wang; Alessandro Sordoni

Guiding Language Model Reasoning with Planning Tokens

Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0

Research Area: Alignment, Learning algorithms for LMs, Inference algorithms for LMs

Keywords: chain-of-thought reasoning, fine-tuning

TL;DR: We propose to add specialized planning tokens in front of each chain-of-thought step to guide and improve language models' math reasoning ability.

Abstract: Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought (CoT) reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. To encourage a more structural generation of CoT steps, we propose a hierarchical generation scheme: we let the LM generate a planning token at the start of each reasoning step, intuitively serving as a high-level plan of the current step, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets and one multihop QA dataset with respect to standard fine-tuning baselines.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 356

Loading