Observation-Free Attacks on Online Learning to Rank

Published: 23 Sept 2025, Last Modified: 18 Nov 2025ACA-NeurIPS2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Observation-free Attack, Reward Manipulation, Adversarial Attack, Online Learning to Rank
TL;DR: This paper presents novel observation-free reward manipulation strategies to promote a set of target items in the Online Learning to Rank (OLTR) framework.
Abstract: Online learning to rank (OLTR) plays a critical role in information retrieval and machine learning systems, with a wide range of applications in search engines and content recommenders. However, despite their extensive adoption, the susceptibility of OLTR algorithms to coordinated adversarial attacks remains poorly understood. In this work, we present a novel framework for attacking some of the widely used OLTR algorithms. Our framework is designed to promote a set of target items so that they appear in the list of top-K recommendations for $T - o(T)$ rounds, while simultaneously inducing linear regret in the learning algorithm. We propose two novel attack strategies: CascadeOFA for CascadeUCB1 and PBMOFA for PBM-UCB. We provide theoretical guarantees showing that both strategies require only $O(\log T)$ manipulations to succeed. Additionally, we supplement our theoretical analysis with empirical results on real-world data.
Submission Number: 5
Loading