Fast and Understandable Nonlinear Supervised Dimensionality Reduction

Anri Patron, Rafael Savvides, Lauri Franzon, Hoang Phuc Hau Luu, Kai Puolamäki

Published: 01 Jan 2024, Last Modified: 02 Oct 2025DS (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In supervised machine learning, feature creation and dimensionality reduction are essential tasks. Carefully chosen features allow simpler model structures, such as linear models, while decreasing the number of features is often used to reduce overfitting. Classical unsupervised dimensionality reduction methods such as principal component analysis may find features irrelevant to the machine learning task. Supervised dimensionality reduction methods, such as canonical correlation analysis, can construct linear projections of the original features informed by the prediction targets. Still, typically, the dimensionality of these projections is restricted to that of the target variables. On the other hand, deep learning-based approaches (either supervised or unsupervised) can construct high-performing features that are not understandable and often slow to train. We propose a novel supervised dimensionality reduction method, called Gradient Boosting Mapping (gbmap), a fast alternative to linear methods in which we make a minimal alteration (nonlinear transformation) to the linear projections designed to retain understandability. gbmap is fast to compute, provides high-quality, understandable features, and automatically ignores directions in the original data features irrelevant to the prediction task. gbmap is a good alternative to “too simple” linear methods and “too complex” black box methods.

External IDs:dblp:conf/dis/PatronSFLP24