Automating Website Registration for Studying GDPR Compliance

Published: 23 Jan 2024, Last Modified: 23 May 2024TheWebConf24EveryoneRevisionsBibTeX
Keywords: Crawling, Registration, Consent, GDPR, ePrivacy Directive, Compliance
TL;DR: We automate the website registrations and the detection of newsletter privacy violations for a large-scale GDPR+ compliance study.
Abstract: Investigating how websites use sensitive user data is an active research area. However, research based on automated measurements has been limited to those websites that do not require user authentication. To overcome this limitation, we developed a crawler that automates website registrations and newsletter subscriptions and detects both security and privacy threats at scale. We demonstrate our crawler's capabilities by running it on 660k websites. We use this to identify security and privacy threats and to contextualize them within the laws of the European Union, namely the General Data Protection Regulation and ePrivacy Directive. Our methods detect private data collection over insecure HTTP connections and websites sending emails with user-provided passwords. We are also the first to apply machine learning to web forms, assessing violations of marketing consent collection requirements. Overall, we find that 37.2\% of websites send marketing emails without proper user consent, which is mostly caused by websites sending first a marketing email right after the subscription. Additionally, 1.8\% of websites share users' email addresses with third parties without a transparent disclosure.
Track: Responsible Web
Submission Guidelines Scope: Yes
Submission Guidelines Blind: Yes
Submission Guidelines Format: Yes
Submission Guidelines Limit: Yes
Submission Guidelines Authorship: Yes
Student Author: Yes
Submission Number: 2473