Lifelong Disk Failure Prediction via GAN-Based Anomaly Detection

Tianming Jiang; Jiangfeng Zeng; Ke Zhou; Ping Huang; Tianming Yang

Lifelong Disk Failure Prediction via GAN-Based Anomaly Detection

Tianming Jiang, Jiangfeng Zeng, Ke Zhou, Ping Huang, Tianming Yang

Published: 01 Jan 2019, Last Modified: 19 Feb 2025ICCD 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As a classical technique in storage systems, disk failure prediction aims at predicting impending disk failures in advance for high data reliability. Over the past decades, taking as input the SMART (Self-Monitoring, Analysis and Reporting Technology) attributes, many supervised machine learning algorithms have been proven to be effective for disk failure prediction. However, these approaches heavily rely on the availability of substantial annotated failed disk data which unfortunately exhibits an extreme data imbalance, i.e., the number of failed disks is much smaller than that of healthy ones, resulting in suboptimal performance and even inability at the beginning of their deployment, i.e., cold starting problem. Inspired by the significant success achieved in GAN (Generative Adversarial Network) based anomaly detection, in this paper, we translate disk failure prediction into an anomaly detection problem. Specifically, we develop a novel Semi-supervised method for lifelong disk failure Prediction via Adversarial training, called SPA. The distinguishing feature of SPA from existing supervised approaches is that SPA is only trained on healthy disks, which avoids the traditional limitations of imbalance in datasets and eliminates the cold starting problem. Furthermore, a novel 2D image-like representation technique is proposed to enable the deployment of deep learning techniques and the automatic feature extraction. Experimental results on real-world SMART datasets demonstrate that, compared with the state-of-the-art supervised machine learning based methods, our approach predicts disk failures at a higher accuracy for the entire lifetime of models, i.e., both the initial period and the long-term usage.

Loading