Evaluating Models of Human Behavior in an Adversarial Multi-Armed Bandit Problem

Marcus Gutierrez, Jakub Cerný, Noam Ben-Asher, Efrat Aharonov-Majar, Branislav Bosanský, Christopher Kiekintveld, Cleotilde Gonzalez

2019 (modified: 14 Oct 2024)CogSci 2019Readers: Everyone

Abstract: We consider the problem of predicting how humans learn interactively in an adversarial Multi-Armed Bandit (MAB) setting. We are motivated by the use of cyber deception in cybersecurity and the need to design effective decoys to lure attackers. We ran a behavioral study in which humans act as cyber attackers, and try to learn the defense strategy for repeatedly assigning nodes in the network to be decoys. We tested humans against three defenses: a stationary strategy, a static game-theoretic solution, and an adaptive MAB strategy. Our results show that humans have the most difficulty learning against the adaptive defense. We also evaluated five different models for predicting the tested human behavior. We compare the predictive quality of these models using our experimental data, showing that a modified version of Thompson Sampling and a cognitive model based on Instance-Based Learning Theory are the best at replicating human learning from our data.

0 Replies