Abstract: Homomorphic Encryption (HE) is a promising technique to guarantee the security and privacy of Machine Learning (ML) applications in the cloud. Rotation is a key operation in HE ML; however, the high computational complexity and memory bandwidth requirements severely limit its performance. This work proposes a low-latency HE rotation accelerator targeting HBM-enabled FPGAs. First, we identify memory inefficiencies due to the access patterns of various sub-routines in rotation. We propose a dynamic data layout technique that converts large stride memory accesses to unit stride accesses to improve the bandwidth utilization. We leverage this technique to develop an FPGA accelerator that supports rotation for various HE parameter settings. The accelerator utilizes an optimized dataflow and an architecture specially designed to perform the dynamic data layout. We evaluate the accelerator using AMD U280 FPGA. Our design achieves up to 2.1 x speedup compared with two commonly used static layout approaches and up to 1.47x speedup compared with state-of-the-art GPU implementation across various rotation benchmarks.
Loading