QUEST: A 7.49TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS

Kodai Ueyoshi; Kota Ando; Kazutoshi Hirose; Shinya Takamaeda-Yamazaki; Junichiro Kadomoto; Tomoki Miyata; Mototsugu Hamada; Tadahiro Kuroda; Masato Motomura

QUEST: A 7.49TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS

Kodai Ueyoshi, Kota Ando, Kazutoshi Hirose, Shinya Takamaeda-Yamazaki, Junichiro Kadomoto, Tomoki Miyata, Mototsugu Hamada, Tadahiro Kuroda, Masato Motomura

Published: 01 Jan 2018, Last Modified: 12 Jun 2025ISSCC 2018EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: A key consideration for deep neural network (DNN) inference accelerators is the need for large and high-bandwidth external memories. Although an architectural concept for stacking a DNN accelerator with DRAMs has been proposed previously, long DRAM latency remains problematic and limits the performance [1]. Recent algorithm-level optimizations, such as network pruning and compression, have shown success in reducing the DNN memory size [2]; however, since networks become irregular and sparse, they induce an additional need for agile random accesses to the memory systems.

Loading