Automated Compliance Checking for Chinese Privacy Policy: A New Task and Dataset

ACL ARR 2024 June Submission1155 Authors

14 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Privacy policy texts inform users about how their personal data is handled by online service providers. However, they may be long, complex, and non-compliant with laws and regulations. Therefore, automated compliance checking of privacy policy texts is needed. In this paper, we introduce the first dataset and task for automated compliance checking of Chinese privacy policy texts. Our dataset provides human experts’ compliance annotation at both the document level and the fine-grained level. The fine-grained annotation includes both the existing named entity recognition (NER) task and 11 new sentence classification (SC) tasks for compliance checking. We treat the NER and classification subtasks as discriminative legal attributes that can help models to generate reliable compliance results and easy-to-understand explanations. Additionally, we further pretrain BERT-Chinese on a large corpus of compliance-related texts and evaluate it on all the tasks. Our results show that our further pre-trained BERT model outperforms the baseline models and demonstrates the potential of NLP techniques for automated compliance checking of privacy policies. Our dataset and the further pre-trained BERT model will be released soon.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Automated Compliance Checking,Chinese,New Dataset,Task
Contribution Types: Publicly available software and/or pre-trained models, Data resources
Languages Studied: Chinese
Submission Number: 1155
Loading