Keywords: dual-process theory; system 2; model-free reinforcement learning
TL;DR: This paper posits that System 2 capabilities in RL agents can be understood as control of an internal agent state.
Abstract: Dual process theory divides cognitive processing into a fast, intuitive System 1 and a slow, deliberative System 2. In reinforcement learning (RL), model-free learning, in which the agent takes actions with a reactive policy, is reminiscent of System 1, whereas model-based decision-time planning is reminiscent of System 2. This paper presents the view that deliberative, System 2 behaviors ("thinking") can be considered a form of mental action that an agent performs before taking an action that influences its external environment. Under this view, we hypothesize that model-free RL alone would be sufficient to produce deliberation if these mental actions ultimately led to higher value actions being selected. We formalize the notion of a controllable ``thought" state, then prove conditions under which "thinking" emerges as a strategy for reward maximization, and discuss how large language models serve as a proof-of-concept for thinking as mental action. Finally, we conclude by discussing new opportunities for research on model-free RL agents that learn both to think and act.
Submission Number: 10
Loading