Achieving Consistent and Comparable CPU Evaluation Outcomes

Chenxi Wang, Lei Wang, Wanling Gao, Yikang Yang, Yutong Zhou, Jianfeng Zhan

Published: 2024, Last Modified: 28 Jul 2025CoRR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The challenge of CPU evaluation lies in the fact that user-perceived performance metrics can only be measured on an independently running system consisting of the CPU and other indispensable components, and hence it is difficult to accurately attribute the deviations in the evaluation outcomes to the differences between the CPUs. Our experiments reveal that the industry-standard CPU benchmark, SPEC CPU2017, suffers from a significant flaw: for the identical CPU, undefined configurations of other indispensable components introduce uncontrolled variability in evaluation outcomes. We propose a rigorous CPU evaluation methodology. Through theoretical analysis and pioneering controlled experiments, we systematically compare our methodology against four established methodologies: the SPEC CPU 2017, two DOE variants, and one RCTs approach. The results show our methodology can achieve consistent and comparable evaluation outcomes, while others exhibit inherent limations.