Abstract: In a distributed system, identifying consistent checkpoints is essential for error recovery and debugging. We design an efficient incremental algorithm capable of identifying all the consistent and removable checkpoints each time a new checkpoint is reported. By doing so, the required memory space can be minimized by removing those removables. While minimizing the memory space, the algorithm requires only O(p/sup 2/M) time in total, where p is the number of processes and M is the number of checkpoints.
0 Replies
Loading