Keywords: LLM reasoning, superposition
Abstract: While Large Language Models (LLMs) usually reason on the language space with discrete tokens, recent studies have found that LLMs can reason on more expressive spaces like continuous latent space. However, training LLMs on continuous latent space is challenging due to lack of sufficient training signals. In this work, we propose a way that teaches LLMs to reason on superpositions of discrete tokens. Our model takes in a superposition of token embeddings and outputs multiple tokens using a Multi-token Prediction (MTP) module. Our empirical results show that with superposition reasoning, the model use $\sim$30\% fewer reasoning tokens on GSM8K compared to the baseline with no accuracy gap.
Submission Number: 180
Loading