Improving Data Stream Classification using Incremental Yeo-Johnson Power Transformation

Eduardo Tieppo; Jean Paul Barddal; Júlio César Nievola

Improving Data Stream Classification using Incremental Yeo-Johnson Power Transformation

Eduardo Tieppo, Jean Paul Barddal, Júlio César Nievola

Published: 01 Jan 2022, Last Modified: 23 Jun 2025SMC 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Data transformation plays an essential role as a preprocessing step in learning models. Several classification techniques have premises about the underlying data distribution, such as normal distribution assumed in Bayesians classifiers. However, applying data transformation in a streaming setting requires processing an infinite and continuous flow of data. In this paper, we propose the Incremental Yeo-Johnson Power Transformation, a variant of the well-known batch Yeo-Johnson transformation that is tailored for streaming settings, i.e., it supports streaming data via statistical sampling and hypothesis testing. Experimental results show that our proposal achieves the same data normality as its batch counterpart. In addition, it improves the prediction performance of a data stream classifier based on Bayesian statistical models. Overall, learning models obtained 3 percentage points improvement.

Loading