Deep and shallow thinking in a single forward pass

Jennifer Hu; Michael Franke

Deep and shallow thinking in a single forward pass

Jennifer Hu, Michael Franke

Published: 10 Oct 2024, Last Modified: 29 Oct 2024NeurIPS 2024 Workshop on Behavioral MLEveryoneRevisionsBibTeXCC BY 4.0

Keywords: language models, reasoning, cognitive reflection task, logit lens

Abstract: Given any input, a language model (LM) performs the same kind of computation to produce an output: a single forward pass through the underlying neural network. Inspired by findings in cognitive psychology, we investigate potential signatures of "deeper" and "shallower" computation within a forward pass, without allowing the model to generate intermediate reasoning steps. We prompt LMs with contrasting statements designed to trigger deeper or shallower reasoning on a set of cognitive reflection tasks. We find suggestive evidence that LMs' preferences for correct (deeper) or intuitive (shallower) answers can be manipulated through prompts related not only to general personality traits, but also situational metabolic, physical, and social factors. We then use the logit lens to investigate how an LM might achieve this behavior. Our results suggest that intuitive answers are preferred in early layers, even when the final behavior is consistent with the correct answer or deeper reasoning. These findings motivate further mechanistic analyses of high-level cognition and reasoning in LMs.

Submission Number: 41

Loading