Abstract: The dueling bandit model has been acknowledged as an efficient analytic tool for sequential decision-making problems with qualitative pairwise comparison. For example, the comparison of workers' completion quality of assigned tasks in crowdsourcing systems; user's ranking of recommended items in recommender systems. In dueling bandits, an agent uses pairwise comparisons of selected arms to balance the exploitation-exploration trade-off during the online learning of uncertainties. Despite the wide application of dueling bandits, their green implementation should also consider the non-neglectable energy costs for selecting arms, implying the green dueling bandit model. Particularly, it requires online control to optimize energy costs adaptively in the long run for sustainable system deployment. Therefore, we 1) employ online learning methods to learn the uncertainties via qualitative pairwise comparisons; 2) utilize online control techniques to guarantee a within-budget energy cost for the green real-world deployment. Accordingly, we propose a Green Dueling Bandit Learning (GDBL) algorithm to effectively integrate dueling bandit learning for the exploration-exploitation trade-off and online control for the optimization of energy costs. We prove that GDBL achieves a sublinear round-averaged regret while keeping the energy cost under budget. We conduct simulations to demonstrate the outperformance of GDBL over baselines.
Loading