Online Learning From Incomplete and Imbalanced Data Streams

Published: 01 Jan 2023, Last Modified: 30 Sept 2024IEEE Trans. Knowl. Data Eng. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Learning with streaming data has attracted extensive research interest in recent years. Existing online learning approaches have specific assumptions regarding data streams, such as requiring fixed or varying feature spaces with explicit patterns and balanced class distributions. While the data streams generated in many real scenarios commonly have arbitrarily incomplete feature spaces and dynamic imbalanced class distributions, making existing approaches be unsuitable for real applications. To address this issue, this paper proposes a novel O nline L earning from I ncomplete and I mbalanced D ata S treams (OLI $^{2}$ DS) algorithm. OLI $^{2}$ DS has a two-fold main idea: 1) it follows the empirical risk minimization principle to identify the most informative features of incomplete feature spaces, and 2) it develops a dynamic cost strategy to handle imbalanced class distributions in real-time by transforming F-measure optimization into a weighted surrogate loss minimization. To evaluate OLI $^{2}$ DS, we compare it with state-of-the-art related algorithms in three kinds of experiments. First, we adopt 14 real datasets to simulate three scenarios of incomplete feature spaces, i.e., trapezoidal, feature evolvable, and capricious data streams. Second, based on a benchmark online analyzer, we generate 13 datasets to simulate incomplete data streams with different imbalance ratios. Third, we analyze concept drift in two simulated scenes, i.e., online learning and data stream mining, and verify the adaption of OLI $^{2}$ DS on repeated concept drifts and variable imbalance ratios. The results demonstrate that OLI $^{2}$ DS achieves a significantly better performance than its rivals. Besides, a real-world case study on movie review classification is conducted to elaborate on our OLI $^{2}$ DS algorithm's effectiveness. Code is released at https://github.com/youdianlong/OLI2DS .
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview