Off-Policy Confidence SequencesDownload PDFOpen Website

2021 (modified: 15 Sept 2022)ICML 2021Readers: Everyone
Abstract: We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting. These confidence sequences are based on recent ideas from martingale analysis ...
0 Replies

Loading