Why is AI "a sea of dudes"? Using data science and NLP methods to understand gender imbalance in a scientific community.Open Website

15 Sept 2021OpenReview Archive Direct UploadReaders: Everyone
Abstract: This dissertation carries an in-depth study of gender in the field of Computation Linguistics. Our approach relies heavily on information that we extract directly from the data, using tools that the very field we are investigating promotes. We perform gender attribution on the authors present in a corpus and investigate new gender classification methods, including character-level LSTMs and face recognition. We then perform a quantitative analysis the publication patterns of these authors, focusing on career development over time, collaboration through coautorship and conference rankings. Most of our results are statistically significant and help paint the landscape of the field. We find that women are underrepresented in the last author position. What is more, men have a higher number of active years in the field and a higher number of publications per active years. In terms of collaboration, females tend to coauthor more papers with other female authors. Another concerning finding is that women are underrepresented at the highest ranked conferences. We employ topic modeling to capture how the shift in the field of Computation Linguistics affects the gender gap and contrast this with earlier findings. We report significant differences in the topics that each gender is more likely to choose. Finally, we look at the effect of an online publishing repository (arXiv), as opposed to a traditional corpus(ACL). Our analysis suggests that there are subtle ways in which gender differences can occur in scholarly authorship and practitioners should be aware of the dangers of any unconscious gender bias.
0 Replies

Loading