Abstract: Legal notices are pervasive online. Digital spaces are littered with legally binding terms and policies that govern digital rights and shape access to justice. Yet many of those texts are opaque — difficult to comprehend and study. Our research addresses that gap. First, we introduce the Multi-Genre Online Terms and Privacy Policies (MOTPP), a synchronic dataset composed of the online terms and privacy policies of prominent digital platforms across nine genres. The dataset contains 835 texts and 5.89 million tokens. Second, we provide an interdisciplinary analysis that illustrates linguistic features of the corpus and presents machine learning tools for scrutinizing digital contracts at scale. Our exploratory application leverages machine learning and synthetic data to analyze key content for consumers, focusing on terms that determine access to justice. The annotated dataset, models, and other resources for this paper are available at GitHub and Hugging Face.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: Computational Social Science and Cultural Analytics, Ethics, Bias, and Fairness
Contribution Types: Data resources, Data analysis
Languages Studied: English
Submission Number: 867
Loading