Large-Scale Anonymized Text-based Disability Discourse Dataset

Published: 01 Jan 2023, Last Modified: 23 May 2025ASSETS 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The involvement of individuals with disabilities in online discussions related to disability and accessibility is a critical area of study. While previous research has qualitatively examined the participation of individuals with disabilities on social media platforms, large-scale analysis of social media content by people with disabilities has been an underexplored area. This paper presents a pioneering large-scale study of disability communities on Reddit. We developed an anonymized text-based dataset that consists of 1.5 million comments posted on three subreddits: r/disability, r/Blind, and r/ADHD. Using topic modeling, we analyzed the dataset and identified eight highly-coherent common categories and their associated keywords across the three subreddits. We contribute an Anonymized Disability Discourse Reddit Corpus (ADDReC) of 1.5 million comments that feature eight disability discourse categories.
Loading