The Kernel Density Integral Transformation
Abstract: Feature preprocessing continues to play a critical role when applying machine learning and statistical methods to tabular data. In this paper, we propose the use of the kernel density integral transformation as a feature preprocessing step. Our approach subsumes the two leading feature preprocessing methods as limiting cases: linear min-max scaling and quantile transformation. We demonstrate that, without hyperparameter tuning, the kernel density integral transformation can be used as a simple drop-in replacement for either method, offering robustness to the weaknesses of each. Alternatively, with tuning of a single continuous hyperparameter, we frequently outperform both of these methods. Finally, we show that the kernel density transformation can be profitably applied to statistical data analysis, particularly in correlation analysis and univariate clustering.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: V2 revision: We thank all the reviewers for their time and attention. We have made numerous improvements to the manuscript based on their helpful feedback. The following key revisions will likely be of interest to all reviewers. - Title, introduction & related work: We renamed the paper from "The Kernel Density Quantile Transformation" to "The Kernel Density Integral Transformation" and improved the introduction and related work, to improve clarity regarding our proposed approach, and both the quantile function and copula-based methods. - Section 2.3 ("Efficient computation"): We improved the computational efficiency using polynomial-exponential kernel (Hofmeyr, 2019), with accompanying analysis. - Section 3.1 ("Feature preprocessing for supervised learning"): We included empirical analysis of using cross-validation to tune the bandwidth. We also added experiments for supervised linear regression, which notably show greater benefits for our approach than for classification. V3 revision: - More details on polynomial-exponential kernel. - Additional Discussion section with recommendations for practitioners. - Additional experiments on optimal bandwidth as a function of sample size. - Miscellaneous small fixes. Camera ready revision: - Added link to deanonymized Github code repository.
Supplementary Material: pdf
Assigned Action Editor: ~Jeff_Phillips1
Submission Number: 1458