Abstract: Governments worldwide have enforced laws and regulations to protect digital data and ensure privacy. Data owners thus have been granted the right to be forgotten. An integral part of ensuring compliance requires companies to take reasonable steps to delete user data not only from databases but also from any AI models trained on it. In addition, model owners may also need to remove data harming the model’s utility, and doing so efficiently, via Selective Forgetting, is crucial. Although extensively studied in classification, its application to document object detection remains underexplored. This paper introduces a novel selective unlearning framework using enhanced Sharded, Isolated, Sliced, and Aggregated (SISA) training combined with an Enhanced Weighted Box Fusion (WBF) strategy. By partitioning datasets into isolated shards and slices, we enable localized unlearning through selective sub-model retraining while mitigating performance loss via ensemble aggregation techniques. Experiments on the ICDAR 2017 POD and Invoices datasets demonstrate that our approach achieves better aggregation performance compared to standard WBF and soft Non-Maximum Suppression (sNMS), striking a balance between unlearning efficiency and model accuracy. These findings provide insights into scalable, privacy-preserving document AI for real-world applications.
External IDs:doi:10.1007/978-3-032-04617-8_5
Loading