Fast Stochastic Recursive Momentum Methods for Imbalanced Data Mining

Xidong Wu, Feihu Huang, Heng Huang

Published: 2022, Last Modified: 29 Sept 2023ICDM 2022Readers: Everyone

Abstract: Standard deep learning models have been mainly designed for balanced data mining tasks and use accuracy to evaluate the classifier. However, in many real-world applications, the distribution of data is skewed. If the standard models, which are designed to optimize the accuracy, are applied to the imbalanced data, the prediction performance could be poor because the model bias towards the majority class. To address the imbalanced data mining problem, areas under precision-recall curves (AUPRC) was proposed as a good measure to evaluate the performance of prediction models on imbalanced data sets, and shows excellent capability in identifying the models with high predictive power. To improve the performance of models, researchers recently design methods to directly optimize AUPRC for imbalanced data mining. However, these approaches suffer from a high iteration complexity and efficient methods are desired. In this paper, we propose a faster stochastic method (i.e., ROAP) for maximizing the AURPC based on the momentum-based variance reduced technique. Our new method is based on the maximization of non-parametric averaged precision (AP), which is a popular unbiased point estimator of AUPRC, and the optimization objective in this paper can be converted into a sum of dependent compositional functions, where the inner functions rely on random variables of both inner and outer levels. Compared to previous methods, our ROAP algorithm can achieve a lower iteration complexity of $O(\epsilon^{-3})$ for finding an ϵ-stationary solution. Furthermore, we extend our method to an adaptive version (i.e., AROAP) with the same iteration complexity of $O(\epsilon^{-3})$. To the best of our knowledge, this paper is the first work showing that the variance reduction method can be incorporated into maximizing the AURPC for efficient data mining on imbalanced datasets. Finally, we conduct extensive experiments on various imbalanced data sets with different models to demonstrate the efficiency of our new algorithms.

0 Replies