Toward a Mechanistic Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: visualization or interpretation of learned representations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: mechanistic interpretability, ai safety, synthetic task, alignment, directed acyclic graphs, chain-of-thought, transformers, cognitive science
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: a synthetic task reveals when and why stepwise inference helps autoregressive language models plan and reason
Abstract: Taking correct steps through elementary logical operations is the essence of log- ical reasoning, culminating in precise planning outcomes. While such step- wise inference approaches have demonstrated benefits in Large Language Mod- els (LLMs), conducting an accurate quantitative evaluation is challenging, given their extensive scale, complexity, and lack of accessibility. Here, we introduce and explore a paradigm casting stepwise inference as a graph navigation problem. We introduce a minimal synthetic setup, where an autoregressive language model solves a navigation task on directed acyclic graphs (DAGs), taking inspiration from computational graphs and execution traces. Despite its apparent simplicity, we demonstrate that our synthetic model effectively recapitulates phenomena ob- served in LLMs. By implementing training with sample paths from start to goal node in a ’step-by-step’ manner, we perform systematic experiments and develop novel analyses illustrating that stepwise navigation proves advantageous when the underlying graph is hierarchical and generalization necessitates the stitching of subpaths observed during pretraining. Further, we observe a diversity-precision tradeoff while varying sampling temperature and a bias towards generating shorter paths. We next elucidate how in-context chain-of-thought exemplars can steer the model’s navigation. Importantly, these exemplars can guide the model to follow a path of reasoning we provide, instead of relying on its potentially biased pri- ors. Together, this work showcases the utility and adaptability of this paradigm in exploring the complexities of logical reasoning and planning in LLMs.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7555
Loading