Keywords: Bandits, Auctions, Concentration inequalities
Abstract: Motivated by online display advertising, this work considers repeated second-price
auctions, where agents sample their value from an unknown distribution with
a cumulative distribution function F . In each auction t, a decision-maker bound
by limited observations selects nt agents from a coalition of N to compete for
a prize with p other agents, aiming to maximize the cumulative reward of the
coalition across all auctions. The problem is framed as an N -armed structured
bandit, each number of player sent being an arm n, with expected reward r(n)
fully characterized by F and p + n. We present two algorithms, Local-Greedy (LG)
and Greedy-Grid (GG), both achieving constant problem-dependent regret. This
relies on three key ingredients: 1. an estimator of r(n) from feedback collected
from any arm k, 2. concentration bounds of these estimates for k within an
estimation neighborhood of n and 3. the unimodality property of r under standard
assumptions on F . Additionally, GG exhibits problem-independent guarantees
on top of best problem-dependent guarantees. However, by avoiding to rely on
confidence intervals, LG practically outperforms GG, as well as standard unimodal
bandit algorithms such as OSUB or multi-armed bandit algorithms
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Hugo_Richard1
Track: Fast Track: published work
Publication Link: https://proceedings.neurips.cc/paper_files/paper/2024/hash/22c799f287fd05e7174fd65a3ce134af-Abstract-Conference.html
Submission Number: 8
Loading