Using Noisy Self-Reports to Predict Twitter User Demographics

Anonymous

18 Nov 2019 (modified: 18 Nov 2019)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone

Keywords: twitter, demographics, selection bias, self-report

Abstract: Computational social science studies often contextualize content analysis within standard demographics. Since demographic attributes are unavailable on many social media platforms, such as Twitter, numerous studies have inferred demographic traits automatically. Despite many studies presenting proof of concept inference of race and ethnicity, training of practical systems remains elusive since there are few annotated datasets. Existing datasets are small, errorful, or fail to cover the four most common racial and ethnic groups in the United States. We present a method to identify self-reports of race and ethnicity from Twitter profile descriptions. Despite errors inherent in automated supervision, we train models sufficiently accurate to identify demographics when measured on a gold standard self-report survey. The result is a reproducible method for creating large-scale training resources for race and ethnicity.

0 Replies