Fast DL-based Simulation with Microarchitecture Agnostic Traces and Instruction Embeddings

Published: 30 May 2024, Last Modified: 14 Jun 2024MLArchSys 2024 OralPosterEveryoneRevisionsBibTeXCC BY 4.0
Workshop Track: Machine Learning for System
Presentation: Virtual
Keywords: Computer architecture simulation, deep learning, program embeddings, transfer learning
Presenter Full Name: Santosh Pandey
TL;DR: The paper introduces a deep learning based microarchitecture simulator that supports detailed, accurate and fast microarchitecture exploration, overcoming the state-of-the-art by a factor of 18.06x speed.
Presenter Email: santosh.pandey@rutgers.edu
Abstract: Microarchitecture simulators are indispensable tools for microarchitecture designers to validate, estimate, optimize, and manufacture new hardware that meets specific design requirements. While the quest for a fast, accurate and detailed microarchitecture simulation has been ongoing for decades, existing simulators excel and fall short at different aspects: (i) Although execution-driven simulation is accurate and detailed, it is extremely slow and requires expert-level experience to design. (ii) Trace-driven simulation reuses the execution traces in pursuit of fast simulation but faces accuracy concerns and fails to achieve significant speedup. (iii) Emerging deep learning (DL)-based simulations are remarkably fast and have acceptable accuracy, but fail to provide adequate low-level microarchitectural performance metrics such as branch mispredictions or cache misses, which is crucial for microarchitectural bottleneck analysis. Additionally, they introduce substantial overheads from trace regeneration and model re-training when simulating a new microarchitecture. Re-thinking the advantages and limitations of the aforementioned three mainstream simulation paradigms, this paper introduces TAO that redesigns the DL-based simulation with three primary contributions: First, we propose a new training dataset design such that the subsequent simulation (i.e., inference) only needs functional trace as inputs, which can be rapidly generated and reused across microarchitectures. Second, to increase the detail of the simulation, we redesign the input features and the DL model using self-attention to support predicting various performance metrics of interest. Third, we propose techniques to train a microarchitecture agnostic embedding layer that enables fast transfer learning between different microarchitectural configurations and effectively reduces the re-training overhead of conventional DL-based simulators. TAO can predict various performance metrics of interest, significantly reduce the simulation time, and maintain similar simulation accuracy as state-of-the-art DL-based endeavors. Our extensive evaluation shows TAO can reduce the overall training and simulation time by 18.06$\times$ over the state-of-the-art DL-based endeavors.
Presenter Bio: Santosh is a final year PhD student at Rutgers University advised by Prof. Hang Liu. Santosh's research interests lie at the intersection of machine learning and computer systems, focusing on two main themes: systems for machine learning and machine learning for systems. In the former, his focus is on harnessing hardware-software co-design and high-performance computing to accelerate diverse algorithm and ML applications. In the latter, his focus is on leveraging ML techniques for performance modeling and microarchitecture design space exploration.
Paper Checklist Guidelines: I certify that all co-authors have validated the presented results and conclusions, and have read and commit to adhering to the Paper Checklist Guidelines, Call for Papers and Publication Ethics.
Workshop Registration: Yes, at least one of the authors has registered for the workshop (Two-Day Registration at minimum).
Submission Number: 17
Loading