TL;DR: We introduce critical tokens -- elements within reasoning trajectories that influence incorrect outcomes. We use contrastive estimation to pinpoint these tokens and extend this approach to enhance the direct preference optimization (DPO) method.
Abstract: Mathematical reasoning tasks pose significant challenges for large language models (LLMs) because they require precise logical deduction and sequence analysis. In this work, we introduce the concept of critical tokens -- elements within reasoning trajectories that significantly influence incorrect outcomes. We present a novel framework for identifying these tokens through rollout sampling and demonstrate their substantial divergence from traditional error tokens. Through extensive experiments on datasets such as GSM8K and MATH500, we show that identifying and replacing critical tokens significantly improves model accuracy. We propose an efficient methodology for pinpointing these tokens in large-scale datasets using contrastive estimation and extend this framework to enhance model training processes with direct preference optimization (DPO). Experimental results on GSM8K and MATH500 benchmarks with the widely used models Llama-3 (8B and 70B) and Deepseek-math (7B) demonstrate the effectiveness of the proposed approach, cDPO. Our results underscore the potential of leveraging critical tokens to reduce errors in reasoning tasks, advancing the development of AI systems capable of robust logical deduction.
Lay Summary: Mathematical reasoning tasks pose significant challenges for AI because they require precise logical deduction and sequence analysis. In our research, we discovered that certain specific parts (we call them "critical tokens") within the AI’s reasoning steps strongly affect whether the answer is correct or not. By creating a new way to spot these crucial parts, we showed we can effectively find and fix mistakes, helping the AI solve math problems better. We tested our method thoroughly on well-known math datasets and found that identifying and correcting these critical parts significantly improved accuracy. We further developed an efficient strategy to quickly pinpoint these key elements, making it practical for large datasets and advanced AI training. Our experiments using popular AI models confirmed that focusing on these critical tokens greatly enhances the AI’s ability to reason correctly, marking an important step forward in building smarter and more reliable AI systems.
Link To Code: https://github.com/chenzhiling9954/Critical-Tokens-Matter
Primary Area: Deep Learning->Large Language Models
Keywords: Large Language Models, Reasoning
Submission Number: 2521
Loading