An algorithm for controlled text analysis on Wikipedia


28 May 2020 (modified: 28 May 2020)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone
  • Keywords: Wikipedia, computational social science, text analysis, bias, social bias
  • TL;DR: We provide and evaluate algorithms for matching Wikipedia pages based on common traits, which can be used to analyze bias on Wikipedia in a controlled setting
  • Abstract: While numerous work has examined bias on Wikipedia, most approaches fail to control for possible confounding variables. In this work, given a target corpus for analysis (e.g. biography pages about women), we present a method for constructing a control corpus that matches the target corpus in as many attributes as possible, except the target attribute (e.g. the gender of the subject). This methodology can be used to analyze specific types of bias in Wikipedia articles, for example, gender or racial bias, while minimizing the influence of confounding variables.
