A High-Throughput Private Inference Engine Based on 3D Stacked Memory

Published: 01 Jan 2024, Last Modified: 14 Apr 2025DAC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Fully Homomorphic Encryption (FHE) enables unlimited computation depth, allowing privacy-enhanced neural network inference tasks directly on the ciphertext. However, existing FHE architectures suffer from the memory access bottleneck. This work proposes a High-throughput FHE engine for private inference (PI) based on 3D stacked memory (H3). H3 adopts the software-hardware co-design that dynamically adjusts the polynomial decomposition during the PI process to minimize the computation and storage overhead at a fine granularity. With 3D hybrid bonding, H3 integrates a logic die with a multi-layer embedded DRAM, routing data efficiently to the processing unit array through an efficient broadcast mechanism. H3 consumes 192mm2 when implemented using a 28nm logic process. It achieves 1.36 million LeNet-5 or 920 ResNet-20 PI per minute, surpassing existing 7nm accelerators by 52%. This demonstrates that 3D memory is a promising technology to promote the performance of FHE.
Loading