PoDD: power-capping dependent distributed applicationsDownload PDFOpen Website

2019 (modified: 02 Nov 2022)SC 2019Readers: Everyone
Abstract: Power budgeting (or capping) has become essential for large-scale computing installations. Meanwhile, as these systems scale out, they can concurrently execute dependent applications that were previously processed serially. Such application coupling reduces IO traffic and overall time to completion as the applications now communicate at runtime instead of through disk. Coupled applications are predicted to be a major workload for future exascale supercomputers; e.g., scientific simulations will execute concurrently with in situ analysis. One critical challenge for power budgeting systems is implementing power capping for coupled applications while still achieving high performance. Existing approaches on power capping coupled workloads, however, have major limitations including: (1) poor practicality, due to dependence on offline application profiling; and (2) limited optimization opportunity, as they consider power reallocation on a strictly global level (from node-to-node), without considering node-level optimization opportunities. To overcome these limitations, we propose PoDD, a hierarchical, distributed power management system for coupled applications. PoDD uses classifiers and online model building to determine optimal power and performance tradeoffs without offline profiling or application instrumentation. We implement it on a 49-node cluster and compare it to SLURM, a state-of-the-art job scheduler that considers power, but not coupling, and PowerShift, a power capping system for coupled applications without node-level optimization. PoDD improves mean performance over SLURM by 14--22% and over PowerShift by 11--13%. Finally, PoDD is resilient to tail behavior and system noise, improving performance in noisy environments by 44% on average compared to even power distribution.
0 Replies

Loading