Abstract: We study the problem of online interaction in general decision making problems,
where the objective is not only to find optimal strategies, but also to satisfy some
safety guarantees, expressed in terms of costs accrued. We propose a theoretical
framework to address such problems and present BAN-SOLO, a UCB-like algorithm that, in an online interaction with an unknown environment, attains sublinear regret of order O(T^{1/2}) and plays safely with high probability at each iteration. At its core, BAN-SOLO relies on tools from convex duality to manage environment exploration while satisfying the safety constraints imposed by the problem.
1 Reply
Loading