Abstract: Most current software systems involve processing personal data, an activity that is regulated in Europe by the general data protection regulation (GDPR) through data processing agreements (DPAs). Developing compliant software requires adhering to DPA-related requirements in GDPR. Verifying the compliance of DPAs entirely manually is however time-consuming and error-prone. In this paper, we propose an automation strategy based on machine learning (ML) for checking GDPR compliance in DPAs. Specifically, we create, based on existing work, a comprehensive conceptual model that describes the information types pertinent to DPA compliance. We then develop an automated approach that detects breaches of compliance by predicting the presence of these information types in DPAs. On an evaluation set of 30 real DPAs, our approach detects 483 out of 582 genuine violations while introducing 93 false violations, achieving thereby a precision of 83.9% and recall of 83.0%. We empirically compare our approach against an existing approach which does not employ ML but relies on manually-defined rules. Our results indicate that the two approaches perform on par. Therefore, to select the right solution in a given context, we discuss differentiating factors like the availability of annotated data and legal experts, and adaptation to regulation changes.
Loading