Abstract: Effective streaming feature selection in dynamic online environments is essential in numerous applications. However, most existing methods evaluate high-dimensional features individually and ignore the potentially pertainable group structures of features. Moreover, the class imbalance underlying streaming data may further decrease the discriminative efficacy of the selected features, resulting in deteriorated classification performance. Motivated by this observation, we propose a cost-sensitive sparse group online learning (CSGOL) framework and its proximal version (PCSGOL) to handle imbalanced and high-dimensional streaming data. We formulate this issue as a new cost-sensitive online optimization problem by leveraging the \(\ell _2\)-norm, \(\ell _1\)-norm, and groupwise sparsity constraints in the dual averaging regularization. Inspired by the proximal optimization, we further introduce the average weighted distance in CSGOL and develop the PCSGOL method to achieve stable prediction results. We mathematically derive closed-form solutions to the optimization problems with four modified hinge loss functions, leading to four variants of CSGOL and PCSGOL. Extensive empirical studies on real-world streaming datasets and online anomaly detection tasks demonstrate the effectiveness of our proposed methods.
Loading