Privacy-Aware Time Series Synthesis via Public Knowledge Distillation

Published: 09 Nov 2025, Last Modified: 09 Nov 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Sharing sensitive time series data in domains such as finance, healthcare, and energy consumption, such as patient records or investment accounts, is often restricted due to privacy concerns. Privacy-aware synthetic time series generation addresses this challenge by enforcing noise during training, inherently introducing a trade-off between privacy and utility. In many cases, sensitive sequences is correlated with publicly available, non-sensitive contextual metadata (e.g., household electricity consumption may be influenced by weather conditions and electricity prices). However, existing privacy-aware data generation methods often overlook this opportunity, resulting in suboptimal privacy-utility trade-offs. In this paper, we present Pub2Priv, a novel framework for generating private time series data by leveraging heterogeneous public knowledge. Our model employs a self-attention mechanism to encode public data into temporal and feature embeddings, which serve as conditional inputs for a diffusion model to generate synthetic private sequences. Additionally, we introduce a practical metric to assess privacy by evaluating the identifiability of the synthetic data. Experimental results show that Pub2Priv consistently outperforms state-of-the-art benchmarks in improving the privacy-utility trade-off across finance, energy, and commodity trading domains.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Thanks action editor for the helpful feedback. We uploaded a camera-ready version inlcuded required revisions.
Assigned Action Editor: ~Chuan-Sheng_Foo1
Submission Number: 5190
Loading