Abstract: Causal relationship extraction is a crucial task in the field of natural language processing, involving the determination of causal relationships between entities or events based on textual descriptions. It can be applied to subsequent tasks like causal graph construction and causal inference. However, due to the imbalanced distribution of labels in the dataset, spurious associations may arise between entities and labels, leading to biases in entity representations. Models may make causal relationship judgments based on these biased entity representations, posing a challenge to the generalization ability of the models. To address this issue, a strategy to mitigate entity bias through intervention in entity representations is proposed. First, entity representations are extracted from sentence representations, and then a Principal Component Analysis method is applied to obtain a semantic subspace. This subspace is used to eliminate semantic information in entity representations that act as confounding factors while retaining more contextual information. To ensure the fairness of model evaluation, an entity pair non-overlapping division is applied to the dataset to prevent biases learned from the training set from transferring to the test set. Experiments on three datasets demonstrate that the entity representation bias adjustment strategy can alleviate entity representation bias in the dataset, resulting in an average F1 score improvement of approximately 1.44% over the baseline model. Compared to other methods, it achieves the highest improvement of approximately 5.93%. Moreover, the results on three different baseline models demonstrate the model-agnostic nature of this approach.
Loading