Privacy-Aware Time Series Synthesis via Public Knowledge Distillation

TMLR Paper5190 Authors

24 Jun 2025 (modified: 26 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Sharing sensitive time series data in domains such as finance, healthcare, and energy consumption—such as patient records or investment accounts—is often restricted due to privacy concerns. Privacy-aware synthetic time series generation addresses this challenge by enforcing noise during training, inherently introducing a trade-off between privacy and utility. In many cases, sensitive sequences is correlated with publicly available, non-sensitive contextual metadata (e.g., household electricity consumption may be influenced by weather conditions and electricity prices). However, existing privacy-aware data generation methods often overlook this opportunity, resulting in suboptimal privacy-utility trade-offs. In this paper, we present Pub2Priv, a novel framework for generating private time series data by leveraging heterogeneous public knowledge. Our model employs a self-attention mechanism to encode public data into temporal and feature embeddings, which serve as conditional inputs for a diffusion model to generate synthetic private sequences. Additionally, we introduce a practical metric to assess privacy by evaluating the identifiability of the synthetic data. Experimental results show that Pub2Priv consistently outperforms state-of-the-art benchmarks in improving the privacy-utility trade-off across finance, energy, and commodity trading domains.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Chuan-Sheng_Foo1
Submission Number: 5190
Loading