Accurate Generation of I/O Workloads Using Generative Adversarial Networks

Published: 01 Jan 2024, Last Modified: 15 May 2025NAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: It is essential to utilize a large number of I/O workloads to analyze commodity system performance or simulate scientific phenomena in high-performance scientific computing. I/O traces are often unavailable at scale due to trace storage overhead, privacy concerns, and the performance impact of trace instrumentation. We study how to generate sufficiently representative I/O workloads using Generative Adversarial Networks (GANs). The best GAN architecture can generate I/O workloads with maximum mean discrepancy (MMD) as low as 0.015-0.05, which implies the synthetic I/O workloads have successfully learned the potential distribution of real I/O traces. We demonstrate that the performance similarity between the original I/O trace and the generated I/O workload through trace replay can be 90.36%-97.32%.
Loading