- Keywords: Multi-party Computing, Secure MPC, MPC, Transformer, Embedding Table, Natural Language Processing
- TL;DR: We performed runtime characterization of Transformer-based model MPC inferences, and reduced the runtime of Embedding table accesses in MPC.
- Abstract: Secure multi-party computation (MPC) is gaining popularity with the growing demand for privacy-preserving cloud services. While there has been plenty of attention to MPCs for convolution neural networks (CNNs), MPC-based private inference for Transformer models has not been studied in detail. This paper provides a characterization study of the performance overhead for running Transformer models with secure MPC, and proposes an optimization for embedding tables. Our study shows that Transformers introduce a couple of new challenges for MPC-based private inference: softmax and embedded tables. To address the overhead of embedding table accesses under MPC, we propose to use tensor-train (TT) decomposition, a mechanism that splits a large embedding tables into multiple smaller embedding tables. For the NLP workloads, the experiments show that the TT decomposition can speed up embedding table accesses by 2x with only a 1.19 drop in the masked-language model perplexity score.
- Paper Under Submission: The paper is NOT under submission at NeurIPS