History-Aware Privacy Budget Allocation for Model Training on Evolving Data-Sharing Platforms

Published: 01 Jan 2024, Last Modified: 10 Feb 2025IEEE Trans. Serv. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The publicly released machine learning (ML) models are susceptible to malicious attacks (e.g., gradient leakage attacks), which may expose sensitive training data of data-sharing platforms to untrusted third-parties. To preserve the privacy of training data, differential privacy (DP) is exploited to limit the amount of leaked privacy with a predefined budget, which in fact is a non-recoverable resource. Considering DP, allocating privacy budgets to ML queries is a non-trivial but crucial problem because a certain amount of non-recoverable privacy budget will be consumed if a datablock is assigned to a query once. Meanwhile, both datablocks and ML queries are continuously generated, which further complicates the problem. Most existing works simply relied on greedy-based algorithms to make myopic allocation decisions, far away from the optimal decision. In this paper, we propose a novel History-aware Privacy Budget Allocation (HPBA) algorithm for data-sharing platforms to address the above challenges. Different from existing works, HPBA leverages historical query records to approximate global ML query patterns so as to overcome the drawback of shortsighted greedy-based algorithms. Moreover, the performance of HPBA is theoretically guaranteed by competitive analysis. A lightweight version called S-HPBA is proposed to further reduce computation overhead by using fewer historical records. Experimental results demonstrate that, compared to the state-of-the-art baselines, HPBA and S-HPBA improve the average performance by 32.8% and 16.2% in terms of model accuracy, respectively.
Loading