Multidimensional categorical data collection under shuffled differential privacy

Ning Wang, Jian Zhuang, Zhigang Wang, Zhiqiang Wei, Yu Gu, Peng Tang, Ge Yu

Published: 01 Jan 2025, Last Modified: 15 May 2025Comput. Secur. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Estimating frequency distributions in multidimensional categorical data is fundamental for many real-world applications, but such data often contains sensitive personal information, necessitating robust privacy protection. The emerging shuffled differential privacy (SDP) model provides a promising solution, yet existing methods are either limited to single-dimensional data or suffer from poor accuracy in multidimensional scenarios. To address these challenges, this paper introduces Multiple Hash Mechanism (MHM), which uses an innovative hash-based local perturbation technique for efficient dimensionality reduction to improve the result accuracy under the SDP framework. Additionally, we provide a detailed analysis of the shuffling benefits of MHM outputs, showing significant accuracy improvements. For cases requiring personalized privacy levels, we propose the Overlapping Group Mechanism, which further enhances the shuffling benefits and boosts overall accuracy. Experimental results on real-world datasets validate the effectiveness of proposed methods.