Abstract: Accurate job finish time estimation is one of the key parts of scheduling strategy design in supercomputing systems. Existing research works concentrate on designing a better or more complex machine learning model to achieve accurate job runtime prediction based on the non-job-specific parameters. These parameters include the number of processors consumed, the user-estimated runtime, job submit time, job ID, and so on. However, we can extract more useful information from the system logs to assist the runtime prediction. The system logs in supercomputing always contain the intermediate output results and input parameters, which motivate us to analyze the running status of the job and predict the job finish time. Since VASP is one of the most popular supercomputing applications in the world, in this paper, we conduct the first investigation into running features and deeply analyze the job-specific parameters. Based on the running and job-specific features, we propose RunningNet, a dynamic finish time prediction model during job running, which contains the running features represented by a time series and the parameters features. Experiments on the VASP job set in the supercomputing system at USTC show that RunningNet achieves state-of-the-art results. The Mean Average Percentage Error metric reaches about 10.3%.
Loading