A Progressive Sampling Method for Dual-Node Imbalanced Learning with Restricted Data Access

Published: 2023, Last Modified: 30 Sept 2024ICDM 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Imbalanced learning, characterised by disproportionate class distributions, impedes the effectiveness of learning algorithms, particularly when available data is scarce. Although the utilisation of external data sources can alleviate these challenges, complete access to such resources is often hampered by privacy regulations or lack of annotations, further complicating the imbalanced learning problem. Additionally, exploiting all data from an external node may not be efficient due to data redundancy and computational constraints. To navigate these issues, this paper introduces an innovative solution for imbalanced learning with restricted data access. We propose a data selection method focused on selecting balanced data from the data-rich but restricted node, prioritising diversity, informativeness and balance. Our strategy mitigates the need for exhaustive data exploration and promotes efficient use of the available data.To further enhance the robustness of data selection, we present an iterative method that progressively selects balanced data. The iterative process, involving training a fully supervised model on the data-shortage node and a contrastive model on the data-rich node, incrementally refines the balance of selected data. Additionally, our method employs prediction entropy to automatically generate weights for training the contrastive models, a distinct improvement over manual weight specification. We validate the effectiveness of our approach through extensive experimentation and demonstrate that our proposed methodology addresses the challenges of imbalanced learning under restricted data access, leading to improved data utilisation, enhanced balance, and better representation in imbalanced learning scenarios. The code is available on GitHub at https://github.com/uqyqiu/CPSL.
Loading