Ghost in the Cloud: Your Geo-Distributed Large Language Models Training is Easily Manipulated

Zichen TANG; Zhenheng Tang; Gaoning Pan; Buhua Liu; Kunfeng Lai; Xiaowen Chu; Bo Li

Ghost in the Cloud: Your Geo-Distributed Large Language Models Training is Easily Manipulated

Zichen TANG, Zhenheng Tang, Gaoning Pan, Buhua Liu, Kunfeng Lai, Xiaowen Chu, Bo Li

Published: 10 Jun 2025, Last Modified: 13 Jul 2025DIG-BUG LongEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Jailbreak attack, Geo-distributed LLM Training, Federated Learning, Large Language Models

TL;DR: This work identifies a new scenario of jailbreak threat in geo-distributed LLM training and proposes two jailbreak attack variants that bypass existing server-side defenses and manipulate the final global model.

Abstract: Geo-distributed training and Federated Learning (FL) enable large-scale LLM training across private or distributed data sources. While beneficial for privacy and scalability, they expose new vulnerabilities: we demonstrate that a single malicious client can successfully implant jailbreak triggers to compromise safety alignment. We identify two potential server-side defenses—Malicious Output Scrutiny (MOS), which detects unsafe generations, and Task Performance Check (TPC), which filters out updates with degraded downstream performance. To bypass both, we propose \textit{CloudGhost}, a trigger-based jailbreak strategy with two key innovations: (1) \textbf{Trigger-based Pseudo-Contrastive Safety Alignment (TPCSA)}, which conceals malicious behavior unless a secret trigger is present; and (2) \textbf{Downstream-preserved Malicious Training (DPT)}, which uses Fisher regularization to preserve downstream performance. Experiments on LLaMA-2 and LLaMA-3 demonstrate that a few attackers can easily achieve an Attack Success Rate (ASR) exceeding 70\% while maintaining a Detection True Rate (DTR) below 5\%, without degrading downstream performance.

Submission Number: 42

Loading