Causal Feature Selection in the Presence of Sample Selection Bias

Published: 01 Jan 2023, Last Modified: 27 Jul 2024ACM Trans. Intell. Syst. Technol. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Almost all existing causal feature selection methods are proposed without considering the problem of sample selection bias. However, in practice, as data-gathering process cannot be fully controlled, sample selection bias often occurs, leading to spurious correlations between features and the class variable, which seriously deteriorates the performance of those existing methods. In this article, we study the problem of causal feature selection under sample selection bias and propose a novel Progressive Causal Feature Selection (PCFS) algorithm which has three phases. First, PCFS learns the sample weights to balance the treated group and control group distributions corresponding to each feature for removing spurious correlations. Second, based on the sample weights, PCFS uses a weighted cross-entropy model to estimate the causal effect of each feature and removes some irrelevant features from the confounder set. Third, PCFS progressively repeats the first two phases to remove more irrelevant features and finally obtains a causal feature set. Using synthetic and real-world datasets, the experiments have validated the effectiveness of PCFS, in comparison with several state-of-the-art classical and causal feature selection methods.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview