Abstract: Deep learning models have become ubiquitous in myriad application areas due to their remarkable performance. This success would not be possible without the high-quality datasets for model training that are contributed by numerous data owners. Datasets have not only become valuable assets for data owners, but also contain sensitive information that raises concerns about privacy leakage. This gives rise to urgent needs for data owners to verify that model developers have stopped using their datasets immediately upon receiving data deletion requests, as mandated by the right to be forgotten regulation. In this paper, we provide an affirmative answer by proposing DuplexGuard: a novel framework for deletion right verification via a duplex watermarking approach. During watermark injection, for each owner's dataset, DuplexGuard generates duplex subsets of watermarked samples, i.e., the ambush subset and the surfacing subset. This duplex design is capable of offering a combination of watermark behaviors before and after data deletion, therefore allowing it to signify all potential dataset usage statuses. DuplexGuard also proposes a new two-way handshake protocol for issuing data deletion requests to provide more robust and decisive verification for the deletion right. Extensive experiments on multiple benchmark datasets demonstrate that DuplexGuard is effective and reliable in verification.
Loading