Abstract: End-cloud collaborative computing framework ensures the security and privacy of edge device data, enabling collaborative training of global models without direct data exchange. However, in practical scenarios, anomalies in training or edge device data may severely degrade or disable the global model’s performance. Existing frameworks lack effective debugging and anomaly localization, hindering real-time monitoring and precise identification of abnormal edge devices in data heterogeneity scenarios. In this paper, we propose a new method named FedCheck, a debugging framework for end-cloud collaborative federated learning that enables real-time alerts and detects abnormal devices for nonindependent and identically distributed (nonIID) data without disrupting the regular training process. Specifically, we employ a model similarity-based method to quantitatively assess the degree of device anomaly in data heterogeneity scenarios, supporting real-time alerts during the end-cloud collaboration process. Furthermore, a simulation program replays the training process based on recorded telemetry data, facilitating backtracking debugging of any training round and the status of edge devices. Finally, the framework removes abnormal devices and repairs the global model. Experiments on MNIST and Fashion-MNIST datasets demonstrate that FedCheck can effectively detect and locate abnormal devices in data heterogeneity scenarios. Even in large-scale federated learning, it maintains high detection performance and exhibits good scalability.
External IDs:dblp:journals/jcsc/Kong0FPWZZ25
Loading