Architecting a Flash-Based Storage System for Low-Cost Inference of Extreme-Scale DNNs

Yunho Jin, Shine Kim, Tae Jun Ham, Jae W. Lee

2022 (modified: 25 Apr 2023)IEEE Trans. Computers 2022Readers: Everyone

Abstract: The size of deep neural network (DNN) models has been exploding rapidly, demanding a colossal amount of memory capacity. For example, Google has recently scaled its Switch Transformer to have a parameter size of up to 6.4 TB. However, today's HBM DRAM-based memory system for GPUs and DNN accelerators is suboptimal for these extreme-scale DNNs as it fails to provide enough capacity while its massive bandwidth is poorly utilized. Thus, we propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Leviathan</i> , a DNN inference accelerator, which integrates a cost-effective <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">flash-based</i> storage system, instead. We carefully architect the storage system to provide enough memory bandwidth while preventing performance drop caused by read disturbance errors. Our evaluation of Leviathan demonstrates an 8.3× throughput gain compared to the iso-FLOPS DNN accelerator with conventional SSDs and up to 19.5× higher memory cost-efficiency than the HBM-based DNN accelerator.

0 Replies