Interpreting Chain-of-thought Reasoning via Partial Information Decomposition
Track: tiny / short paper (up to 4 pages)
Keywords: Chain-of-thought, Partial Information Decomposition, Interpretability
TL;DR: This work proposes a new interpretability framework to evaluate the quality of reasoning process of LLM.
Abstract: Large reasoning models have generated interest in complex tasks. However, they often generate verbose, repetitive, or incorrect reasoning steps on challenging problems. In this work, we introduce a new interpretability framework SLIDER for evaluating the quality of the reasoning process, assessing consecutive steps in terms of incorrectness and repetitiveness. SLIDER leverages an emerging body of work from information theory called Partial Information Decomposition (PID) to disentangle the information about the target between two consecutive reasoning steps into non-negative components: unique information in a reasoning step $S_i$ or $S_{i+1}$ that is not in the other, redundant information that is common between both steps, and synergistic information which is only meaningful when the steps are considered jointly. Given the responses of a large reasoning model, SLIDER moves across the steps in a sliding-window, projects them onto a meaningful embedding space, and then computes a set of new per-token information-decomposition measures that enables the identification of various failure modes. We demonstrate application of SLIDER to analyze incorrectness and repetitiveness for several use-cases across arithmetic problems and GSM8K word problems.
Presenter: ~Barproda_Halder1
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 109
Loading