Causal Feature Selection With Imbalanced Data

Published: 01 Jan 2025, Last Modified: 22 Jul 2025IEEE Trans. Emerg. Top. Comput. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Causal feature selection as an emerging topic has drawn increasing attention in the field of causal discovery and machine learning. However, existing causal feature selection algorithms do not consider the class imbalance problem, thus affecting the accuracy of measuring relationships between the class variable and the features with imbalanced data. To address this issue, we theoretically analyze the weighted mutual information relationships between the class variable and its causal features through feature relevance, redundancy, and complementarity, and propose a Weighted-Causal Feature Selection algorithm (W-CFS). Specifically, W-CFS uses weighted mutual information to discriminate between minority and majority classes, to accurately discover the Markov blanket of the class variable with imbalanced data. The comprehensive experimental results on imbalanced benchmark Bayesian networks datasets and imbalanced real-world datasets demonstrate that the proposed algorithm has better accuracy and comparable efficiency in comparison to State-of-the-Art competitors.
Loading