Keywords: Differential Privacy, Machine Learning, Query Release, Synthetic Data, Deep Learning
TL;DR: We initiate the study of differentially private query release for hierarchical data.
Abstract: While differentially private query release has been well-studied, research in this area is commonly restricted to data that do not exhibit hierarchical structure. However, in many real-world scenarios, individual data points can be grouped together (e.g., people within households, taxi trips per driver, etc.), begging the question---what statistical properties (or queries) are important when considering data of this form? In addition, although synthetic data generation approaches for private query release have grown increasingly popular, it is unclear how one can generate synthetic data at both the group and individual-level while capturing such statistical properties. In light of these challenges, we formalize the problem of hierarchical query release and provide a set of statistical queries that capture relationships between attributes at both the group and individual-level. Furthermore, we propose and implement a novel synthetic data generation algorithm, H-GEM, which outputs hierarchical data subject to differential privacy to answer such statistical queries. Finally, using the American Community Survey, we evaluate H-GEM, establishing a benchmark for future work to measure against