Keywords: unimodal bandits, multi-arm bandit, auctions
TL;DR: We prove that a specific structured bandit problem motivated by online auctions can be tackled with locally greedy algorithms.
Abstract: Motivated by online display advertising, this work considers repeated second-price auctions, where agents sample their value from an unknown distribution with cumulative distribution function $F$. In each auction $t$, a decision-maker bound by limited observations selects $n_t$ agents from a coalition of $N$ to compete for a prize with $p$ other agents, aiming to maximize the cumulative reward of the coalition across all auctions.
The problem is framed as an $N$-armed structured bandit, each number of player sent being an arm $n$, with expected reward $r(n)$ fully characterized by $F$ and $p+n$.
We present two algorithms, Local-Greedy (LG) and Greedy-Grid (GG), both achieving *constant* problem-dependent regret. This relies on three key ingredients: **1.** an estimator of $r(n)$ from feedback collected from any arm $k$, **2.** concentration bounds of these estimates for $k$ within an estimation neighborhood of $n$ and **3.** the unimodality property of $r$ under standard assumptions on $F$. Additionally, GG exhibits problem-independent guarantees on top of best problem-dependent guarantees. However, by avoiding to rely on confidence intervals, LG practically outperforms GG, as well as standard unimodal bandit algorithms such as OSUB or multi-armed bandit algorithms.
Supplementary Material: zip
Primary Area: Bandits
Submission Number: 3252
Loading