Abstract: An understanding of sources of performance variability is important for high performance application developers and users. In this paper we discuss non-I/O sources of application performance variability on Cori, a Cray XC40 at NERSC with 9600+ Xeon Phi nodes connecting to an Aries high speed network with a Dragonfly topology. Our survey covers variability due to on-node effects from MCDRAM configured as cache and clock frequency scaling as well as off-node effects due to the network. For each source of variability we quantify the variability through micro-benchmarks and mini-applications, discuss potential mitigation strategies and analyze the impact on applications.
Loading