Protecting High-Resolution Poverty Statistics against Disclosure using Differential Privacy

Raphaël De Fondeville, Michael Shoemate, Wanrong Zhang, Salil Vadhan

Published: 25 Sept 2023, Last Modified: 19 May 2025UNECE Expert Meeting on Statistical Data Confidentiality 2023EveryoneCC BY 4.0

Abstract: The past few years have seen an explosion of the volume of geo-referenced data, a trend that can be observed in the world of official statistics: large scale imputation, generalizing survey results to the whole population, is made more and more common thanks to the efficiency and the flexibility of new machine learning algorithms. Official agencies are now capable of providing realistic estimates of population characteristics at lower than ever aggregation levels, but communicating survey results at always finer geographical scales strongly increases privacy risks. Thus, in order to maintain trust between populations and their administrations, official statistical offices must ensure highest levels of confidentiality. In this context, Differential Privacy (DP) has been successfully applied to protect individual's privacy by addition of properly scaled random noise. We first discuss the specificities of DP applied to regionalized statistics and present a baseline framework minimizing the amount of noise necessary to successfully control disclosure risk when releasing spatial aggregates. The technical readiness of the framework is illustrated through a synthetic case study based on Swiss poverty statistics using the OpenDP Library. Finally, we discuss some limitations of the DP framework when controlling disclosure risk of geo-referenced data and present some ongoing themes of research.