Abstract: Neural text classification models are known to001explore statistical patterns during supervised002learning. However, such patterns include spurious patterns and superficial regularity in the004training data. In this paper, we exaggerate superficial regularity in the text to prevent unau-006thorized exploration of personal data.007We propose a gradient-based method to construct text modifications, which can make deep009neural networks (DNNs) unlearnable.We010then analyze text modifications exposed by the gradient-based method and further propose012two simple hypotheses to manually craft unlearnable text. Experiments on four tasks (sen-014timent classification, topic classification, read-015ing comprehension and gender classification validate the effectiveness of our method, by which these hypotheses achieve almost un-018trained performance after training on unlearn-019able text.
Paper Type: long
0 Replies
Loading