Abstract: Data transformation plays an essential role as a preprocessing step in learning models. Several classification techniques have premises about the underlying data distribution, such as normal distribution assumed in Bayesians classifiers. However, applying data transformation in a streaming setting requires processing an infinite and continuous flow of data. In this paper, we propose the Incremental Yeo-Johnson Power Transformation, a variant of the well-known batch Yeo-Johnson transformation that is tailored for streaming settings, i.e., it supports streaming data via statistical sampling and hypothesis testing. Experimental results show that our proposal achieves the same data normality as its batch counterpart. In addition, it improves the prediction performance of a data stream classifier based on Bayesian statistical models. Overall, learning models obtained 3 percentage points improvement.
Loading