Automatically Optimized Gradient Boosting Trees for Classifying Large Volume High Cardinality Data Streams Under Concept Drift
Abstract: Data abundance along with scarcity of machine learning experts and
domain specialists necessitates progressive automation of end-to-end machine
learning workflows. To this end, Automated Machine Learning (AutoML) has
emerged as a prominent research area. Real world data often arrives as streams or
batches, and data distribution evolves over time causing concept drift. Models need
to handle data that is not independent and identically distributed (iid), and transfer
knowledge across time through continuous self-evaluation and adaptation adhering
to resource constraints. Creating autonomous self-maintaining models which not
only discover an optimal pipeline, but also automatically adapt to concept drift
to operate in a lifelong learning setting was the crux of NeurIPS 2018 AutoML
challenge. We describe our winning solution to the challenge, entitled AutoGBT,
which combines an adaptive self-optimized end-to-end machine learning pipeline
based on gradient boosting trees with automatic hyper-parameter tuning using
Sequential Model-Based Optimization (SMBO). We report experimental results on
the challenge datasets as well as several benchmark datasets affected by concept
drift and compare it with the baseline model for the challenge and Auto-sklearn.
Results indicate the effectiveness of the proposed methodology in this context.
0 Replies
Loading