Abstract: Multi-access Edge Computing (MEC) deploys computation and storage resources at the network edge, enabling devices to process data and requests on nearby edge services. This reduces data transmission latency and network congestion. However, due to edge servers' volatile running status and limited resources, the reliability of edge services deployed on them fluctuates over time. This may lead to concept drifts in edge services' real-time reliability streaming data. A severe negative drift may indicate a runtime reliability anomaly in an edge service, which often impacts users' Quality of Experience (QoE). To ensure edge services' reliability, this paper proposes CS-Detection, a hierarchical approach for detecting runtime reliability anomalies based on concept drift. CS-Detection employs the compressed sensing technique to sample complex and large-scale reliability streaming data. It employs a new technique that combines Variational AutoEncoder and Energy-Based Generative Adversarial Network (E2BGAN) to estimate the anomaly level of edge services by calculating the reconstruction error and discriminant error of compressed real-time reliability streaming data. To demonstrate the usefulness of CS-Detection in ensuring the QoE of MEC systems, we present CPRest, a coordinated checkpoint-based effective rejuvenation approach for restoring the normal operation of edge services affected by runtime reliability anomalies. CPRest classifies detection results into four levels and adjusts the edge services' restart trigger time accordingly. Comprehensive experiments conducted on real-world datasets demonstrate the effectiveness and efficiency of CS-Detection compared to state-of-the-art approaches.
External IDs:doi:10.1109/tmc.2025.3632794
Loading