everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
We study the problem of best-item identification from relative feedback where a learner adaptively plays subsets of items and receives stochastic feedback in the form of the best item in the set. We propose an algorithm - Dynamic Elimination (DE) - that dynamically prunes sub-optimal items from contention to efficiently identify the best item and show a strong sample complexity upper bound for it. We further formalize the notion of inferred updates to obtain estimates on item win rates without directly playing them by leveraging item correlation information. We propose the Dynamic Elimination by Correlation (DEBC) algorithm as an extension to DE with inferred updates. We show through extensive experiments that DE and DEBC significantly outperform all existing baselines across multiple datasets in various settings.