Abstract: In this paper, we present and evaluate an automated pipeline for the large-scale analysis of corporate privacy policies. Organizations usually develop their privacy policies in isolation to best balance their business needs, user rights, as well as regulatory requirements. A wide-ranging and structured analysis of corporate privacy policies is essential to facilitate a deeper understanding of how organizations have balanced competing requirements. Our approach consists of a web crawler that can navigate to and scrape content from web pages that contain privacy policies, and a set of AI chatbot task prompts to process and extract structured/labeled annotations from the raw data. The analysis includes the types of collected user data, the purposes for which data is collected and processed, data retention and protection practices, and user rights and choices. Our validation shows that our annotations are highly accurate and consistent. We use this architecture to gather data on the privacy policies of companies in the Russell 3000 index, resulting in hundreds of thousands of annotations across all categories. Analysis of the resulting data allows us to obtain unique insights into the state of the privacy policy ecosystem as a whole.
Loading