Abstract: Strategically variable behavior can be advantageous in various fields such as sports (unpredictability), art (creativity), science (innovation), and problem-solving (thinking outside the box). Although previous studies identified experimental conditions under which humans and non-human animals show increased variable decision-making, we have only a limited understanding of its underlying cognitive mechanisms. Using a reinforcement learning model, we simulate the use of three different theorized strategies in an adversarial reward learning environment that requires very high variability. Model simulations with a policy-gradient meta-learning algorithm show that agents could respond more optimally in such environments by (1) relying on a stochastic generator, (2) increasing one’s learning rate to allow for faster interactions between reinforcement learning and extinction, or (3) strategically upvalue unchosen actions using a frequency-based memory. After demonstrating the theoretical benefit of each of these strategies, we fitted our model on existing datasets of human-, pigeons- and rat behavior in adversarial environments. We show that, while all three species can engage in highly variable behavior, only humans strategically upvalue unchosen actions as a strategy to achieve variability.
Loading