Data-dependent Bounds with T-Optimal Best-of-Both-Worlds Guarantees in Multi-Armed Bandits using Stability-Penalty Matching.

Quan Nguyen, Shinji Ito, Junpei Komiyama, Nishant A. Mehta

28 Sept 2025CoRR 2025EveryoneCC BY-SA 4.0
Loading