Abstract: Highlights•Based on telemetry data, the GRAAFE framework predicts the compute node availability.•It is the first HPC anomaly prediction framework based on graph neural networks.•GRAAFE is a full-scale ML-ops framework for anomaly prediction in HPC.•It requires an additional 30% CPU and 5% more RAM compared to monitoring only.
Loading