Allegro: GPU Simulation Acceleration for Machine Learning Workloads

Euijun Chung; Seonjin Na; Hyesoon Kim

Allegro: GPU Simulation Acceleration for Machine Learning Workloads

Euijun Chung, Seonjin Na, Hyesoon Kim

Published: 30 May 2024, Last Modified: 11 Jun 2024MLArchSys 2024 OralPosterEveryoneRevisionsBibTeXCC BY 4.0

Workshop Track: System for Machine Learning

Presentation: In-Person

Keywords: GPU Simulation, GPU Kernel Sampling, GPU Simulation Acceleration, Machine Learning Workloads on GPU

Presenter Full Name: Euijun Chung

TL;DR: A statistical approach for GPU kernel sampling to accelerate GPU simulations on machine learning workloads.

Presenter Email: euijun@gatech.edu

Abstract: Current GPU simulators face challenges in handling large machine learning workloads like LLMs due to their slow execution. To address this issue, we leverage our observations on the massive number of GPU kernel calls inside these workloads. Given the homogeneous nature and cache-unfriendly behavior of these kernels, we demonstrate that they exhibit identically and independently distributed (i.i.d.) execution times, thus allowing us to apply statistical sampling approaches for accurate GPU kernel sampling. We introduce Allegro, a practical methodology for GPU simulators that significantly reduces workload size while maintaining low error. Employing a statistical measure with a recursive algorithm, we design an accurate kernel sampling scheme, supported by a proof of theoretical error bounds. By integrating Allegro into Macsim, we achieve simulation speedup of 983.96x on 7 of the latest ML workloads with an error rate of 0.057%. Compared to other simulation acceleration techniques, Allegro achieves an average error which is 9.22x smaller than the random sampling method with the speedup value fixed. Additionally, we demonstrate that adjusting the error bound enables the simulator to achieve larger speedups with only a slight increase in error. This flexibility provided by Allegro allows researchers to easily balance desired performance and accuracy.

Presenter Bio: Euijun Chung is a first-year PhD student in the School of Computer Science at Georgia Tech. He is advised by Prof. Hyesoon Kim and is broadly interested in computer architecture, with a specific focus on GPU performance and security.

Paper Checklist Guidelines: I certify that all co-authors have validated the presented results and conclusions, and have read and commit to adhering to the Paper Checklist Guidelines, Call for Papers and Publication Ethics.

Dataset Release: I certify that all co-authors commit to release the dataset and necessary scripts to reproduce the presented results.

Workshop Registration: Yes, at least one of the authors has registered for the workshop (Two-Day Registration at minimum).

Submission Number: 3

Loading