Rajesh Bordawekar, Oded Shmueli

Feb 12, 2018 (modified: Jun 04, 2018) ICLR 2018 Workshop Submission readers: everyone Show Bibtex
  • Abstract: Database information may be disclosed in a variety of ways depending on the sensitivity of the stored information and the recipient’s need to know. Traditionally, researchers have been concerned with preventing a recipient of the information from associating sensitive information (e.g., a disease) with specific individuals. However, other concerns may apply. For example, within an enterprise (Domingo-Ferrer et al., 2016), certain test results may be considered sensitive and should be only be openly disclosed to divisions concerned with these results. On the other hand, disclosing as much information as possible may also be in the enterprise’s interest as it is not always clear what information may actually be useful to a division. We propose a mechanism that allows a discloser to exercise fine control over what is being disclosed and allowing disclosing information indirectly rather than directly. The mechanism is based on word embedding, a technique from Natural Language processing (NLP) in which each word is associated with a low dimensional (say, 200) vector of real numbers. These vectors are constructed so as to capture the meaning of the associated words. In disclosing vectors constructed based on sensitive information, rather than the information itself, we achieve degrees of disclosures.
  • Keywords: Word Embedding, Data Privacy, Relational Databases
  • TL;DR: Using word embedding approaches to enable data privacy in relational databases