Efficient Privacy-Preserving Data Annotation via Active PrivBayes Synthetic Data Generation

Osamu Saisho, Takayuki Miura, Kazuki Iwahana, Masanobu Kii, Rina Okada

Published: 2025, Last Modified: 27 May 2026PerCom Workshops 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Human involvement is essential for building training datasets and AI models, especially in pervasive computing where sensitive real-world data are used for novel applications. Data annotation is typically performed by humans to prepare data for AI applications. While privacy-preserving techniques such as federated learning and secure computation have been widely studied for AI training and inference, they do not cover the entire AI lifecycle or address human-data interaction. This study pro-poses and demonstrates a method for efficient privacy-preserving human annotation using synthetic data generation integrated with both active learning and differential privacy. Specifically, the method iteratively generates synthetic data containing only explanatory variables from real-world data with differential privacy guarantees, incorporating the acquisition function from active learning into the generation process. Experimental results demonstrate that the proposed method improves the efficiency of human annotation work compared to a simple combination of existing methods. Moreover, it overcomes the trade-off between privacy preservation and AI model performance, achieving both stricter privacy guarantees and higher model accuracy.
Loading