Leveraging Large Language Models for Adversarial Attacks on Information Retrieval Systems

Leveraging Large Language Models for Adversarial Attacks on Information Retrieval Systems

ACL ARR 2024 April Submission454 Authors

16 Apr 2024 (modified: 06 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) have demonstrated exceptional proficiency in generating responses to diverse user queries and prompts. Recent studies have shown that synthetic test collections generated by LLMs are at least as effective at training and evaluating ranking models as existing collections like MS MARCO, which are based on text and relevance judgments from humans. In this paper, we harness the capabilities of LLMs to generate adversarial attacks against information retrieval systems by introducing counterfactual documents into corpora. We prompt LLMs to generate these counterfactual documents, which we call “evil-twin” documents, from a combination of queries and factually correct documents that are known to be relevant to these queries. The evil-twin documents deliberately contain disinformation that mirrors and refutes information contained in their associated “good-twin” documents. To evaluate our approach we employ various neural ranking models to re-rank good-twin and evil-twin documents, demonstrating that evil-twin documents can achieve higher positions in rankings, thereby increasing the likelihood that a searcher will be exposed to the disinformation they contain. Because we use a variety of factually correct documents as mirror images for the evil-twin documents, their content is more diverse than disinformation generated by LLMs prompted with queries alone.

Paper Type: Long

Research Area: Information Retrieval and Text Mining

Research Area Keywords: Information Retrieval, Adversarial Attack, Large Language Models

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Submission Number: 454

Loading