Online Machine Learning Algorithms over Data StreamsOpen Website

2019 (modified: 25 Feb 2022)Encyclopedia of Big Data Technologies 2019Readers: Everyone
Abstract: The area of online machine learning in big data streams covers algorithms that (1) use only a limited possibility to store past data, (2) adapt their models to concept drift on the fly, and (3) work in a distributed computational environment. In this chapter, we overview the main online learning methods for classification and regression, the most important machine learning tasks. We highlight the most important ideas, including linear models, gradient descent, and tree-based methods. In these algorithms, older data is no longer available to revise earlier suboptimal modeling decisions as the fresh data arrives. Furthermore, due to the infinite nature of the data stream, online classifiers and regressors are best evaluated by the prequential method, which we also describe in this chapter. This entry is a reference material and not a survey. We attempt neither depth nor width in coverage, rather to give the most important pointers and references. This entry can be read independently but based on the concepts introduced in the “ Overview of Online Machine Learning in Big Data Streams ” chapter of this Handbook. Additional topics are covered in the chapters “ Reinforcement Learning, Unsupervised Methods, and Concept Drift in Stream Learning ” and “ Recommender Systems over Data Streams .”
0 Replies

Loading