Characterizing and Improving MPC-based Private Inference for Transformer-based ModelsDownload PDF

Published: 04 Nov 2021, Last Modified: 15 May 2023PRIML 2021 PosterReaders: Everyone
Keywords: Multi-party Computing, Secure MPC, MPC, Transformer, Embedding Table, Natural Language Processing
TL;DR: We performed runtime characterization of Transformer-based model MPC inferences, and reduced the runtime of Embedding table accesses in MPC.
Abstract: Secure multi-party computation (MPC) is gaining popularity with the growing demand for privacy-preserving cloud services. While there has been plenty of attention to MPCs for convolution neural networks (CNNs), MPC-based private inference for Transformer models has not been studied in detail. This paper provides a characterization study of the performance overhead for running Transformer models with secure MPC, and proposes an optimization for embedding tables. Our study shows that Transformers introduce a couple of new challenges for MPC-based private inference: softmax and embedded tables. To address the overhead of embedding table accesses under MPC, we propose to use tensor-train (TT) decomposition, a mechanism that splits a large embedding tables into multiple smaller embedding tables. For the NLP workloads, the experiments show that the TT decomposition can speed up embedding table accesses by 2x with only a 1.19 drop in the masked-language model perplexity score.
Paper Under Submission: The paper is NOT under submission at NeurIPS
1 Reply

Loading