Abstract: We present a study on the non-stationary stochastic multi-armed bandit (MAB) problem, which is relevant for addressing real-world challenges related to sequential decision-making. Our work involves a thorough analysis of state-of-the-art algorithms in dynamically changing environments. To address the limitations of existing methods, we propose the Concept Drift Adaptive Bandit (CDAB) framework, which aims to capture and predict potential future concept drift patterns in reward distribution, allowing for better adaptation in non-stationary environments. We conduct extensive numerical experiments to evaluate the effectiveness of the CDAB approach in comparison to both stationary and non-stationary state-of-the-art baselines. Our experiments involve testing on both artificial datasets and real-world data under different types of changing environments. The results show that the CDAB approach exhibits strong empirical performance, outperforming existing methods in all versions tested.
Loading