Abstract: Split Learning (SL) is a widely adopted distributed privacy-preserving training paradigm with minimal computational overhead for clients. However, Feature-Space Hijacking Attack (FSHA) poses a significant threat against SL, where the server manipulates the client's optimization process, compromising input privacy. Some studies propose that clients can detect potential hijacking by monitoring the gradients returned by the server. However, these gradient-based methods are vulnerable to adversarial anti-detection and lack robustness to changes in model architecture. In this paper, we propose a novel detection method named Bidirectional Feature Discrepancy Defense (BiFD), which leverages features to capture richer semantic information. We also observe that hijacked features are easier to reconstruct and harder to classify, providing a key distinction between malicious and honest servers—an aspect overlooked in previous works. Extensive results across multiple datasets and model architectures demonstrate the excellent and robust performance of BiFD.
External IDs:dblp:conf/icmcs/XuYWZYLL25
Loading