A Robust Algorithm to Unifying Offline Causal Inference and Online Multi-armed Bandit Learning

Qiao Tang, Hong Xie

Published: 2021, Last Modified: 16 May 2023ICDM 2021Readers: Everyone

Abstract: Utilizing offline logged data to improve sequential or online decision making is drawing more and more attention. VirUCB is one of the latest notable algorithmic framework in this research line, and it has both sound theoretical guarantee and nice empirical performance. However, regarding VirUCB, it is still unclear: (1) how imbalanced offline logged data influences the decision making accuracy; (2) how to schedule offline logged data across the decision making horizon so as to reduce offline logged data consumption. We show that with imbalanced offline logged data, VirUCB can have a learning speed slower than the baseline algorithm without offline logged data. This finding inspires us to design RobVirUCB algorithm, which is robust against such imbalanced data, i.e., still maintains a fast learning speed. RobVirUCB adaptively selects “useful” offline logged data to speed up learning and it has theoretical guarantees on regret. Finally, we design EffVirUCB algorithm, which reduces offline logged data consumption of RobVirUCB. EffVirUCB schedules the offline logged data to the decision round that the decision maker may select suboptimal arms and it has theoretical guarantees on regret. Extensive experiments on both synthetic data and real-world data validate the superior performance of RobVirUCB and EffVirUCB.

0 Replies