Abstract: This paper introduces EBIE (Evolutionary Bias Identification with Embeddings), a new method to help tackle algorithmic bias in natural language processing (NLP) tasks. The method leverages the powerful representation of word embeddings through an evolutionary algorithm, focusing on classification tasks. EBIE monitors shifts in individual embedding dimensions over generations and, by tracking these dimensional changes, identifies which parts of the embedding are most responsive to changes performed by genetic operations. These insights reveal critical features that influence model decisions and expose latent biases embedded within NLP classifiers. Through correlation analysis between individual tokens and classification scores, EBIE uncovers systematic biases in model behavior, such as reliance on stereotypical markers and neglect of nuanced expressions. By uncovering these tendencies, our methodology provides actionable insights to refine model training, enhance fairness, and improve robustness. Its flexibility ensures broad applicability across various NLP tasks, offering a powerful and versatile framework for developing more equitable and transparent machine learning systems.
Loading