Differentially Private n-gram Extraction

Kunho Kim; Sivakanth Gopi; Janardhan Kulkarni; Sergey Yekhanin

Differentially Private n-gram Extraction

Kunho Kim, Sivakanth Gopi, Janardhan Kulkarni, Sergey Yekhanin

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Differential Privacy, Ngram Extraction

TL;DR: We present a new algorithm for differentially private n-gram extraction which is a useful primitive for NLP applications.

Abstract: We revisit the problem of $n$-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many $n$-grams as possible while preserving user level privacy. Extracting $n$-grams is a fundamental subroutine in many NLP applications such as sentence completion, auto response generation for emails, etc. The problem also arises in other applications such as sequence mining, trajectory analysis, etc., and is a generalization of recently studied differentially private set union (DPSU) by Gopi et al. (2020). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012).

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: https://github.com/microsoft/differentially-private-ngram-extraction

9 Replies

Loading