Keywords: learning, decision making, drift-diffusion model, information theory, cognitive modeling
TL;DR: We propose a DDM whereby an agent makes a decision when a mixture of within-trial reward rate and information rate is maximized, and provide empirical and normative support for this model.
Abstract: Normative accounts of decision-making predict that people attempt to balance the immediate reward associated with a correct response against the cost of deliberation. However, humans frequently deliberate longer than normative models say they should. We propose that people try to optimize not only their rate of material rewards, but also their rate of information gain. A computational model that combines this idea with a standard drift diffusion process reveals that an agent programmed to maximize a combination of reward and information rates acts like human decision makers, reproducing key patterns of behavior not predicted by existing models. Moreover, if we assume that skill level is sensitive to deliberation time, a novice agent who maximizes even a small amount of information rate will often earn more reward in the long run than one who only maximizes reward rate. Maximizing a combination of reward and information rate is a relatively simple and myopic strategy, but approximates optimal behavior over learning, making it a candidate heuristic for this difficult intertemporal choice problem.
In-person Presentation: yes