Safe Data Resampling Method Based on Counterfactuals Analysis

Published: 01 Jan 2024, Last Modified: 01 Aug 2025ICANN (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The challenges of limited data availability and imbalance can significantly affect model performance, providing a strong motivation for developing robust data resampling strategies. However, existing resampling methods generally neglect the fact that different data samples and features have different importance, which can lead to irrelevant or incorrect resampled data. Counterfactual analysis aims to identify the minimum feature changes required to flip a model decision. Through this approach, it is possible to precisely measure the impact of each feature on the decision and evaluate the ease of flipping prediction of one single sample. Inspired by this, we propose two types of safeness evaluation metrics based on counterfactual instances to measure the safeness of features and samples, respectively. Then, we can achieve high quality data resampling by selecting safe features and samples, or by changing feature values within safe intervals. In addition, the proposed safeness evaluation metrics can be seamlessly integrated into existing data resampling methods to further enhance the performance. Experimental results show that our resampling method improves the data diversity while reducing the noise introduced by resampled data, thereby achieving safe resampling.
Loading