Adversarial BEIR: Benchmarking Information Retrieval Models Against Query Perturbations

ACL ARR 2025 May Submission3570 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Information retrieval plays a crucial role in many applications, serving as the primary mechanism for accessing relevant data within large and complex datasets. This study investigates the robustness of retrievers against adversarial queries, employing 17 distinct query perturbation techniques across three granularity levels: character, word, and sentence. Our findings reveal that top-performing retrievers exhibit significant vulnerabilities to these adversarial queries, resulting in notable performance degradation. Additionally, we explore the capability of Large Language Models (LLMs) to generate adversarial queries autonomously, without human intervention. By prompting LLMs to create paraphrases of queries and subsequently annotating these using both automated and manual methods, we assess their effectiveness in this task. We introduce Adversarial BEIR, a comprehensive benchmark for measuring the robustness of retrievers to adversarial queries. By sharing our benchmark and detailed methods, we enable researchers to evaluate the robustness of their retrievers and create additional adversarial samples.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: passage retrieval, dense retrieval, benchmarking, automatic creation and evaluation of language resources, NLP datasets, automatic evaluation of datasets, generalization, retrieval, adversarial examples, robustness
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 3570
Loading