Abstract: As scientific applications become increasingly data-intensive, the I/O and storage components can easily become the bottlenecks of high performance computing (HPC) systems. Such a trend drives recent development of complex, layered HPC I/O software stack, which contains a huge number of tunable parameters, to achieve good I/O performance. It then becomes necessary for the application developers to fully understand the performance projection of their applications given a set of chosen parameters, and also their potential effects on other concurrent applications. Although there are a number of recent studies aiming to model the relationship between I/O performance of applications and their parameters to provide such insights, the complexity of HPC systems and the interactions between concurrent applications make existing approaches over-simplified and far from practical. In this paper, we try to solve this problem using a powerful deep learning tool, the long short-term memory (LSTM), together with the full history of I/O settings of all concurrent applications in the system. To prove the concept, we carry out extensive experiments using the IOR benchmark and train the model using real system logs. Empirical results show that our LSTM-based model can predict the write speed more accurately than the state-of-the-art approaches. We hope this work can initiate the study toward more accurate and practical I/O performance estimation in HPC systems.
0 Replies
Loading