Keywords: Chain-of-Thought Reasoning, Data Processing Inequality, Markov chain, Partial Information Decomposition
TL;DR: Chain-of-Thought Reasoning for Math
Abstract: Chain-of-Thought (CoT) prompting improves the reasoning capabilities of large language models (LLMs), but its theoretical basis remains poorly understood. We propose an information-theoretic framework to analyze and improve CoT through two complementary lenses. First, we model CoT as a Markov process $X \to Z \to Y$, where intermediate steps $Z$ mediate information from inputs $X$ to outputs $Y$. By applying the Data Processing Inequality and Fano’s inequality, we show that explicit reasoning lowers the bound on prediction error. Second, we use Partial Information Decomposition (PID) to quantify how CoT rationales contribute to task performance. Our analysis reveals strong synergy, i.e., reasoning and answers together, provide more information than either alone. Building on this insight, we introduce a PID-guided loss that promotes synergy during CoT distillation. On the e-SNLI dataset, this approach outperforms standard fine-tuning and mutual information baselines. To validate CoT’s benefits in structured domains, we also study few-shot arithmetic reasoning. CoT prompting boosts accuracy from 4\% to 70\% with just one example and up to 90\% with four, far surpassing regular prompting. Overall, our findings offer a theoretical foundation for CoT and suggest new strategies for improving reasoning in LLMs.
Submission Number: 146
Loading