To Stack or Not To Stack

Richard Afoakwa, Lejie Lu, Hui Wu, Michael C. Huang

Published: 2019, Last Modified: 27 Sept 2024PACT 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: 3D memory technology, such as Micron's hybrid memory cube (HMC), has re-energized the architectural pursuit of computation very close to, or inside the memory chip. Such a design falls into the broader category of near-data processing (NDP). The motivation for such design is because the current Von Neumann architecture of chip-multiprocessors is thought to make data movement expensive. Current NDP work focuses on the possibility of architecting computation engines, such as accelerators, cores, or graphic processing units right below the memory layers and inside the logic layer of the HMC sub-system. However, such a stacking design does present a number of technical challenges such as heat dissipation, power supply, etc. While these challenges can certainly be overcome, and needs to be addressed, in this work, we seek to answer a related question of whether it is necessary to stack general-purpose computation engines, directly inside the memory unit, in order to achieve the performance potential of NDP system; thus, to stack or not to stack. We show that, with computing models used in current NDP designs, placing the computation engines very close to, but outside the memory system (not stacking) can provide comparable performance without significant energy costs. This can be achieved without inventing any new technology, but utilizing current state-of-the-art high-speed link design practices.