Privacy-Preserving Unsupervised Spherical Text Embeddings

Published: 01 Jan 2024, Last Modified: 19 May 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Conventional text embeddings, typically learned in the Euclidean space, may struggle to effectively capture word semantics based solely on the directional similarity between word vectors. In contrast, spherical text embeddings have demonstrated remarkable efficacy in various natural language processing (NLP) tasks recently. However, training spherical text generative models (STGMs) requires large representative datasets, which could potentially contain sensitive private information. To mitigate this concern, we propose a novel approach: the differential private spherical text generative model (DP-STGM), which facilitates learning text embeddings within the spherical space while ensuring privacy via efficient Riemannian optimization and the framework of differential privacy. To evaluate the efficacy of our privacy-preserving algorithm, we initially train an adversary using an external dataset without the application of differential privacy. Subsequently, we introduce two metrics to measure the model’s ability to protect privacy: (1) cosine similarity between recovered words from the adversary that generates the spherical text embeddings and those generated by DP-STGM, and (2) Top-n rank correlation. Our experimental findings demonstrate that DP-STGM outperforms baseline models, showcasing its superior performance. By leveraging the power of differential privacy in the Riemannian optimization process, our model achieves better preservation of sensitive information while simultaneously capturing the semantic nuances inherent in word embeddings. As a result, DP-STGM represents a robust and efficient solution for NLP tasks that require privacy protection without compromising on the quality of learned text embeddings. By offering a privacy-preserving alternative, DP-STGM broadens the range of applications for which STGMs can be safely employed, ensuring data privacy while harnessing the rich information contained in spherical text embeddings. Our work opens up new avenues for future research in privacy-aware NLP and advances the state-of-the-art in both privacy protection and semantic learning in this domain.
Loading