Message Injection Attack on Rumor Detection under the Black-Box Evasion Setting Using Large Language Model

Published: 01 Jan 2024, Last Modified: 16 May 2025WWW 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recent analyses have disclosed that existing rumor detection techniques, despite playing a pivotal role in countering the dissemination of misinformation on social media, are vulnerable to both white-box and surrogate-based black-box adversarial attacks. However, such attacks depend heavily on unrealistic assumptions, e.g., modifiable user data and white-box access to the rumor detection models, or appropriate selections of surrogate models, which are impractical in the real world. Thus, existing analyses fail to uncover the robustness of rumor detectors in practice. In this work, we take a further step towards the investigation about the robustness of existing rumor detection solutions. Specifically, we focus on the state-of-the-art rumor detectors, which leverage graph neural network based models to predict whether a post is rumor based on the Message Propagation Tree (MPT), a conversation tree with the post as its root and the replies to the post as the descendants of the root. We propose a novel black-box attack method, HMIA-LLM, against these rumor detectors, which uses the Large Language Model to generate malicious messages and inject them into the targeted MPTs. Our extensive evaluation conducted across three rumor detection datasets, four target rumor detectors, and three baselines for comparison demonstrates the effectiveness of our proposed attack method in compromising the performance of the state-of-the-art rumor detectors.
Loading