Abstract: We propose Critical Datapath Length (CDL), a powerful, interpretable metric of neural-network models that enables accurate execution time prediction on parallel device architectures. CDL addresses the fact that the total number of floating-point operations (FLOPs) in a model is an inconsistent predictor of real execution time due to the highly parallel nature of tensor operations and hardware accelerators. Our results show that, on GPUs, CDL correlates to execution time significantly better than FLOPs, making it a useful performance predictor.
0 Replies
Loading