Abstract: Most of data stream classifier learning methods assume that a true class of an incoming object is available right after the instance has been processed and new and labeled instance may be used to update a classifier's model, drift detection or capturing novel concepts. However, assumption that we have an unlimited and infinite access to class labels is very naive and usually would require a very high labeling cost. Therefore the applicability of many supervised techniques is limited in real-life stream analytics scenarios. Active learning emerges as a potential solution to this problem, concentrating on selecting only the most valuable instances and learning an accurate predictive model with as few labeling queries as possible. However learning from data streams differ from online learning as distribution of examples may change over time. Therefore, an active learning strategy must be able to handle concept drift and quickly adapt to evolving nature of data. In this paper we present novel active learning strategies that are designed for effective tackling of such changes. We assume that most labeling effort is required when concept drift occurs, as we need a representative sample of new concept to retrain properly the predictive model. Therefore, we propose active learning strategies that are guided by drift detection module to save budget for difficult and evolving instances. Three proposed strategies are based on learner uncertainty, dynamic allocation of budget over time and search space randomization. Experimental evaluation of the proposed methods prove their usefulness for reducing labeling effort in learning from drifting data streams.
Loading