Abstract: Homomorphic Encryption (HE) offers a promising solution for privacy-preserving Graph Convolutional Network
(GCN) inference in untrusted cloud environments by enabling computation directly on encrypted data. This
capability is particularly valuable in domains such as recommendation systems, financial analysis, and bioinfor-
matics, where data confidentiality is paramount. However, applying HE to large-scale GCN inference introduces
substantial computational and memory overhead, severely limiting scalability and runtime efficiency. While prior
works focusing on algorithmic improvements have demonstrated feasibility on CPUs, these approaches struggle
to scale effectively on GPUs due to excessive memory consumption and redundant computation. In this work,
we present G-HEMP, the first framework that leverages multi-GPU systems to accelerate large-scale private
GCN inference. G-HEMP introduces two key innovations: (i) a block-diagonal parallel packing scheme that
eliminates redundant data replication in encrypted adjacency matrices, reducing the number of HE operations
and achieving up to 4.41× speedup over conventional feature-wise packing under single GPU environment; and
(ii) a multi-GPU workload partitioning strategy that halves per-GPU peak memory usage on a 4-GPU system
and achieves up to 3.88× latency improvement. Compared to the limb-level-partitioning-based approach in
Cinnamon–the state-of-the-art encrypted computation parallelization method, G-HEMP further attains up to
3.13× gain owing to our superior multi-device partition policy. Overall, G-HEMP is model-agnostic and scales
seamlessly with graph size and GPU count, enabling efficient and practical privacy-preserving GCN inference on
modern heterogeneous environments.
Topics: Reliability & Security: Privacy and security in ML applications
Submission Number: 75
Loading