Feedback-Guided Black-box Attack in Federated Learning: A Cautious Attacker Perspective

TMLR Paper5590 Authors

09 Aug 2025 (modified: 22 Nov 2025)Withdrawn by AuthorsEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Federated Learning (FL) is a robust approach to collaborative machine learning that upholds the integrity of data privacy by ensuring that data remains with the owners. However, FL systems are vulnerable to sophisticated adversarial attacks from malicious clients, especially those leveraging black-box settings. Unlike centralized data poisoning, attacking FL presents unique challenges (i) server-side defense mechanisms can detect and discard suspicious client updates, requiring attacks to maintain minimal visibility across multiple training rounds, and (ii) malicious clients must repeatedly generate poisoned data using only their local black-box model for each round of training, as previous poisoning attempts may be nullified during global aggregation. This forces adversaries to craft stealthy poisoned data locally in a black-box context for each round, maintaining low visibility while ensuring impact. Existing FL attack methods often show high visibility while maintaining impact due to their attack nature, the scale of the introduced perturbations, and the lack of detection strategies. Also, these methods often rely on maximizing cross-entropy loss on the true class, resulting in delayed attack convergence and highly noticeable perturbations. Hence, it is crucial to develop a stealthy data poisoning attack with low visibility for black-box settings in order to comprehend the motives of a cautious attacker in designing an FL attack. To address these challenges, we propose a Feedback-guided Causative Image Black-box Attack (F-CimBA), which is specifically designed for FL by adding random perturbation noise to the data. F-CimBA minimizes the loss of the most confused class (i.e., the incorrect class that the model confuses with the highest probability) instead of the true class, allowing it to exploit local model vulnerabilities for early attack convergence. This approach ensures that poisoned updates maintain low visibility, reducing the likelihood of server-side rejection. Furthermore, F-CimBA adapts effectively under non-IID data distributions and varying attack scenarios, consistently degrading the global model's performance. Additionally, we analyze its impact on system hardware metrics, highlighting the stealth and efficiency of F-CimBA, considering the computational overhead of repeated poisoning attempts in the FL context. Our evaluation demonstrates F-CimBA's consistent ability to poison the global model with minimal visibility under varying attack scenarios and non-IID data distributions, even in the presence of robust server-side defenses.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=C3xxQVfve0&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: We have received encouraging remarks last time. The reviewers positively noted the paper's **extensive experimental evaluation across diverse datasets and aggregation algorithms**, its **comparative rigor against existing attacks**, and the fact that our **F-CimBA operates under realistic black-box assumptions**, requiring **no access to gradients, no interference with the training process or server aggregation logic**, and maintaining **low visibility throughout**. We also received recognition of the paper's **clarity of writing** and the inclusion of **comprehensive ablation studies** to support our claims. In this revised version of our manuscript, we have made substantial improvements and additions to address all major concerns raised during the previous review process, including: - A **revised and theoretically grounded convergence analysis**, incorporating **Lipschitz continuity assumptions** to justify the impact of $\mu$-bounded perturbations and clearly define the $\delta$-bounded convergence claim in **Section 5**. - **Clarification of the attack methodology**, particularly how random perturbation noise is used and updated based on feedback in the black-box setting, now fully reflected in **Algorithm 1** and corresponding explanations in **Page 8**. - Inclusion of **experiments on additional model architectures**, including **Vision Transformers (ViTS) and other two architectures, namely, ResNet50 and DenseNet121**, to demonstrate that F-CimBA remains effective across diverse models in **Section 8.1**. - New results under **partial participation settings** to ensure fairness and consistency across datasets, addressing concerns about mixed experimental configurations in **Section 8.3**. - Extended evaluations against suggested **state-of-the-art defense mechanisms** such as **FABA** and **Centered Clipping**, in addition to existing defenses like **Krum**, **LoMar**, and **FLDefender**, highlighting the robustness and stealth of F-CimBA even under strong defenses in **Section 7.4** and updated **Table 9**. - A detailed response to the **broader impact concern**, emphasizing that our work is aligned with the security community's objective to proactively uncover and mitigate vulnerabilities in federated learning systems in **Section 8.6**. - Clarification of ambiguities, attack scenarios (e.g., single-client vs. multi-client), and terms like "1A", "B", "W" that were not clearly defined in the previous submission. We have modified and proofread the entire manuscript to avoid any such ambiguities. We believe that these thorough clarifications and additional results have significantly strengthened the paper and adequately addressed all previous reviewer concerns. We deeply appreciate the opportunity to revise our manuscript and submit this new version.
Assigned Action Editor: ~Aurélien_Bellet1
Submission Number: 5590
Loading