Abstract: This paper proposes a new method for estimating a direction in a word embedding space corresponding to an interpretable semantic property such as gender, race, or religion. Our technique assumes that words can be assigned numerical scores that quantify their association with the target property. We estimate the subspace by maximizing the covariance or correlation of these scores with the projection of word embeddings along the subspace. Using our technique, we show that word embedding spaces in English, French, and Chinese contain subspaces that encode gender, race, religion, sentiment, word length, and national population. We then apply our technique to the mitigation of gender and racial bias from word embeddings. We find that using our technique to estimate a gender or race subspace improves performance on several benchmarks.
Paper Type: long
0 Replies
Loading