Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
USING WORD EMBEDDING TO SELECTIVELY DISCLOSE DATABASE INFORMATION
Rajesh Bordawekar, Oded Shmueli
Feb 12, 2018 (modified: Feb 12, 2018)ICLR 2018 Workshop Submissionreaders: everyone
Abstract:Database information may be disclosed in a variety of ways depending on the sensitivity of the stored information and the recipient’s need to know. Traditionally, researchers have been concerned with preventing a recipient of the information from associating sensitive information (e.g., a disease) with specific individuals. However, other concerns may apply. For example, within an enterprise (Domingo-Ferrer et al., 2016), certain test results may be considered sensitive and should be only be openly disclosed to divisions concerned with these results. On the other hand, disclosing as much information as possible may also be in the enterprise’s interest as it is not always clear what information may actually be useful to a division. We propose a mechanism that allows a discloser to exercise fine control over what is being disclosed and allowing disclosing information indirectly rather than directly. The mechanism is based on word embedding, a technique from Natural Language processing (NLP) in which each word is associated with a low dimensional (say, 200) vector of real numbers. These vectors are constructed so as to capture the meaning of the associated words. In disclosing vectors constructed based on sensitive information, rather than the information itself, we achieve degrees of disclosures.
TL;DR:Using word embedding approaches to enable data privacy in relational databases
Keywords:Word Embedding, Data Privacy, Relational Databases
Enter your feedback below and we'll get back to you as soon as possible.