Abstract: Trapezoidal data streams pervasively exist in many applications, where the number of features keeps increasing over time with the continuous arrival of new instances. Extensive learning solutions on such data streams with a doubly streaming nature make a strong assumption that all features have similar scales. However, this is impractical in many applications where features may vary with time. When applying on streaming data with different scaled features, they may cause a poor convergence and be unable to rescale these features under streaming conditions. In this paper, we propose two effective online learning algorithms that can maintain feature scale-invariants even with arbitrary scaling of features in the incoming trapezoidal data streams. Specifically, two algorithms are designed to learn a classifier by keeping track of the cumulative sum of squared gradients, the negative cumulative sum of gradients, maximum feature values occurred and the cumulative sum of occurrences of features. The scaled gradients are employed to update their corresponding classifier weights. We also conduct experiments on nine UCI datasets to evaluate the effectiveness of our proposed algorithms. The experimental results demonstrate that the features in the incoming trapezoidal data streams indeed have different scales and our proposed solutions can significantly outperform the state-of-the-art solution in terms of prediction accuracy.
0 Replies
Loading