Efficient Memory Side-Channel Protection for Embedding Generation in Machine Learning

Muhammad Umar, Akhilesh Parag Marathe, Monami Dutta Gupta, Shubham Jogprakash Ghosh, G. Edward Suh, Wenjie Xiong

Published: 2025, Last Modified: 15 May 2025HPCA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Modern machine learning (ML) models need to process both continuous and categorical/discrete feature values, e.g., deep learning recommendation models (DLRMs) rely on users’ categorical features to make recommendations, and large language models (LLMs) take discrete words/tokens as input. ML models process such discrete features by converting them to numerical vectors called embeddings. Unfortunately, embedding table lookups are vulnerable to side-channel attacks, as table indices leak input feature values. Due to the size of the embedding tables, using conventional oblivious computing techniques such as ORAM to protect memory access patterns to the tables incur significant overhead. In this paper, we propose to use a different technique, Deep Hash Embedding (DHE), to secure embedding table accesses, even though it is not commonly used today due to its compute-intensive nature. We investigate three embedding generation methods with side-channel protection: linear scan of the embedding table, embedding table protected by ORAM, and DHE. Our experiments on DLRMs and LLMs show that DHE or a hybrid scheme combining DHE and linear scan can significantly improve both performance and memory footprint compared to the conventional ORAM protection. For DLRM on Criteo datasets, our hybrid scheme improves performance by about $4 \times$ for large embedding tables, and up to $3.08 \times$ end-to-end over the optimized ORAM baseline without any loss in accuracy, while reducing the model memory footprint by up to $1116 \times$. For a GPT-2 LLM, using DHE speeds up the prompt prefill by up to $1.32 \times$ and decoding by up to $1.07 \times$ over ORAM, depending on the batch size, with comparable output quality.