USING WORD EMBEDDING TO SELECTIVELY DISCLOSE DATABASE INFORMATION

Rajesh Bordawekar; Oded Shmueli

USING WORD EMBEDDING TO SELECTIVELY DISCLOSE DATABASE INFORMATION

Rajesh Bordawekar, Oded Shmueli

12 Feb 2018 (modified: 05 May 2023)ICLR 2018 Workshop SubmissionReaders: Everyone

Abstract: Database information may be disclosed in a variety of ways depending on the sensitivity of the stored information and the recipient’s need to know. Traditionally, researchers have been concerned with preventing a recipient of the information from associating sensitive information (e.g., a disease) with specific individuals. However, other concerns may apply. For example, within an enterprise (Domingo-Ferrer et al., 2016), certain test results may be considered sensitive and should be only be openly disclosed to divisions concerned with these results. On the other hand, disclosing as much information as possible may also be in the enterprise’s interest as it is not always clear what information may actually be useful to a division. We propose a mechanism that allows a discloser to exercise fine control over what is being disclosed and allowing disclosing information indirectly rather than directly. The mechanism is based on word embedding, a technique from Natural Language processing (NLP) in which each word is associated with a low dimensional (say, 200) vector of real numbers. These vectors are constructed so as to capture the meaning of the associated words. In disclosing vectors constructed based on sensitive information, rather than the information itself, we achieve degrees of disclosures.

TL;DR: Using word embedding approaches to enable data privacy in relational databases

Keywords: Word Embedding, Data Privacy, Relational Databases

4 Replies

Loading