Abstract: In edge computing environments, service reliability is often threatened by the sudden failure of edge nodes due to harsh deployment conditions, leading to task interruption and performance degradation. To address this challenge, EFTR-HGNet is proposed as a novel task rescheduling framework tailored for edge-fault scenarios. It leverages heterogeneous graph neural networks with a Transformer-based architecture to achieve cost-efficient task migration and adaptive decision making. Specifically, the rescheduling problem is formulated as a Markov decision process (MDP), and a 3-D fault-aware state representation that jointly encodes task attributes, resource availability, and dynamic failure status is introduced. To model the complex relationships between failed tasks and heterogeneous edge resources, a heterogeneous Transformer (HG-Trans) network is designed, which performs two-stage embedding over the constructed graph, enabling context-aware rescheduling decisions to be made by the agent. By optimizing both the policy and value functions within an Actor-Critic reinforcement learning framework, our method achieves a favorable balance between minimizing the overall Makespan and maximizing the task rescheduling success rate. Evaluated against strong baselines like heterogeneous earliest completion time first algorithm (HEFT), task replication and cluster-based scheduling algorithm (TDCA), and FixDoc, EFTR-HGNet demonstrated superior performance, achieving a Makespan reduction of at least 11.11% and a 4.20% increase in task rescheduling success. These results highlight its robustness and practical potential for fault-prone edge computing systems.
External IDs:dblp:journals/iotj/YaoWL25
Loading