Keywords: Adversarial Machine Learning, AI Safety, Security, NLP, Retrieval, Text Representations, Text Embeddings
TL;DR: We show embedding-based retrieval can be manipulated via knowledge-base poisoning, on widely-used models and various threat models, and identify possible contributors to susceptability.
Abstract: Embedding-based text retrieval—retrieval of relevant passages from knowledge databases (KDBs) via deep learning encodings—has emerged as a powerful method attaining state-of-the-art search results and popularizing the use of Retrieval Augmented Generation (RAG). Still, like other search methods, embedding-based retrieval may be susceptible to search-engine optimization (SEO) attacks, where adversaries promote malicious content by introducing adversarial passages to KDBs. To faithfully assess the susceptibility of such systems to SEO, this work proposes the _GASLITE_ attack, a mathematically principled gradient-based search method for generating adversarial passages without relying on the KDB content or modifying the model. Notably, _GASLITE_'s passages _(1)_ carry adversary-chosen information while _(2)_ achieving high retrieval ranking for a selected query distribution when inserted to KDBs. We extensively evaluated _GASLITE_, testing it on nine advanced models and comparing it to three baselines under varied threat models, focusing on one well-suited for realistic adversaries targeting queries on a specific concept (e.g., a public figure). We found _GASLITE_ consistently outperformed baselines by $\ge$140\% success rate, in all settings. Particularly, adversaries using _GASLITE_ require minimal effort to manipulate search results—by injecting a negligible amount of adversarial passages ($\le$0.0001\% of the KDBs), they could make them visible in the top-10 results for 61-100\% of unseen concept-specific queries against most evaluated models. Among other contributions, our work also identifies several factors that may influence model susceptibility to SEO, including the embedding space's geometry. We will make our code publicly available.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6963
Loading