GRAAFE: GRaph Anomaly Anticipation Framework for Exascale HPC systems

Published: 01 Jan 2024, Last Modified: 27 Jul 2025Future Gener. Comput. Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Based on telemetry data, the GRAAFE framework predicts the compute node availability.•It is the first HPC anomaly prediction framework based on graph neural networks.•GRAAFE is a full-scale ML-ops framework for anomaly prediction in HPC.•It requires an additional 30% CPU and 5% more RAM compared to monitoring only.
Loading