Abstract: Standard Regression models are presented with n samples from an input space X that is composed of observational data of the form (xi, y(x<sub>i</sub>)), i = 1...n where each x<sub>i</sub> denotes a k-dimensional input vector of design variables and y is the response. When k ≫ n, high variance and over-fitting become a major concern. In this paper we propose a novel approach to mitigate this problem by transforming the input vectors into new smaller vectors (called Z set) using only a set of simple statistical moments. Genetic Algorithm (GA) has been used to evolve a transformation procedure. It is used to optimise an optimal sequence of statistical moments and their input parameters. We used Linear Regression (LR) as an example to quantify the quality of the evolved transformation procedure. Empirical evidences, collected from benchmark functions and real-world problems, demonstrate that the proposed transformation approach is able to dramatically improve LR generalisation and make it outperform other state-of-the-art regression models such as Genetic Programming, Kriging, and Radial Basis Functions Networks. In addition, we present an analysis to shed light on the most important statistical moments that are useful for the transformation process.
External IDs:dblp:conf/cec/KattanKOM14
Loading