Abstract: Large Language Models (LLMs) have demonstrated exceptional proficiency in generating responses to diverse user queries and prompts. Recent studies have shown that synthetic test collections generated by LLMs are at least as effective at training and evaluating ranking models as existing collections like MS MARCO, which are based on text and relevance judgments from humans. In this paper, we harness the capabilities of LLMs to generate adversarial attacks against information retrieval systems by introducing counterfactual documents into corpora. We prompt LLMs to generate these counterfactual documents, which we call “evil-twin” documents, from a combination of queries and factually correct documents that are known to be relevant to these queries. The evil-twin documents deliberately contain disinformation that mirrors and refutes information contained in their associated “good-twin” documents. To evaluate our approach we employ various neural ranking models to re-rank good-twin and evil-twin documents, demonstrating that evil-twin documents can achieve higher positions in rankings, thereby increasing the likelihood that a searcher will be exposed to the disinformation they contain. Because we use a variety of factually correct documents as mirror images for the evil-twin documents, their content is more diverse than disinformation generated by LLMs prompted with queries alone.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: Information Retrieval, Adversarial Attack, Large Language Models
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Submission Number: 454
Loading