SearchAttack: Red Teaming Search-augmented LLMs via Injecting the Multi-hop Information-seeking Tasks

17 Sept 2025 (modified: 26 Sept 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM safety, Red Teaming
TL;DR: We introduce a knowledge-augmented LLM red teaming method and real-world criminal datasets to reveal the weaponization potential of search-augmented LLMs for emerging and domain-specific malicious tasks
Abstract: Search-augmented Large Language Models (LLMs), which integrate web search with generative reasoning, are highly attractive attack targets, as they can be weaponized to exploit real-time information for malicious purposes. However, existing studies remain limited in assessing their vulnerabilities to the malicious use of their knowledge search and application capabilities. This study proposes \textbf{\textit{SearchAttack}}, a method that uses multi-hop information-seeking queries and harmfulness rubrics to exploit LLMs' web search capability for malicious goal achieving. The core attack strategy is: 1) Embedding sensitive cues into multiple challenging information-seeking tasks, thereby triggering LLMs to launch the search process for solving harmful tasks; 2) Using a reverse-engineered rubric to guide LLMs in organizing searched knowledge into a valuable malicious report. We further build a harmful behavior dataset that reflects ongoing Chinese black and gray market activities in 2025 to evaluate search-augmented LLMs' attack value. Experiments have shown that SearchAttack achieves state-of-the-art attack success rate and generates more practically harmful outputs.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2026/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9139
Loading