CARM: CUDA-Accelerated RNS Multiplication in Word-Wise Homomorphic Encryption Schemes for Internet of Things

Abstract: Homomorphic encryption (HE), which allows computation over encrypted data, has often been used to preserve privacy. However, the computationally heavy nature and complexity of network topologies make the deployment of HE schemes in the Internet of Things (IoT) scenario difficult. In this work, we propose CARM, the first optimized GPU implementation that covers BGV, BFV and CKKS, targeting for accelerating homomorphic multiplication using GPU in heterogeneous IoT systems. Our solution is suitable for accelerating RNS homomorphic multiplication on both high-performance and embedded GPUs, as it is a parametric and generic design and offers various trade-offs between resource and efficiency. We offer constant-time low-level arithmetic with minimum instructions and memory usage, as well as performance- and memory-prior configurations. Through this, we can provide more real-time evaluation results and relieve the computational pressure on cloud devices. We deploy our implementations on two GPUs. Compared to the CPU implementation, we achieve up to <inline-formula><tex-math notation="LaTeX">$378.4\times$</tex-math></inline-formula> , <inline-formula><tex-math notation="LaTeX">$234.5\times$</tex-math></inline-formula> , and <inline-formula><tex-math notation="LaTeX">$287.2\times$</tex-math></inline-formula> speedup for homomorphic multiplication of BGV, BFV, and CKKS on Tesla V100S, and <inline-formula><tex-math notation="LaTeX">$8.8\times$</tex-math></inline-formula> , <inline-formula><tex-math notation="LaTeX">$9.2\times$</tex-math></inline-formula> , and <inline-formula><tex-math notation="LaTeX">$10.3\times$</tex-math></inline-formula> on Jetson AGX Xavier, respectively.
0 Replies
Loading