ESTIMATING IMPORTANCE OF HIGHLY CORRELATED FEATURES USING MATRIX DECOMPOSITION

Alexander S. Minkin

ESTIMATING IMPORTANCE OF HIGHLY CORRELATED FEATURES USING MATRIX DECOMPOSITION

Alexander S. Minkin

Published: 15 Mar 2026, Last Modified: 15 Mar 20262026 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: hyperspectral images, machine learning, decision trees, feature selection

TL;DR: The algorithm for feature importance approximation taking into account the ML model accuracy is suggested.

Abstract: Hyperspectral images contain a large volume of source data that exhibits high correlations along neighboring spectral bands. This makes it necessary to select the most informative features among correlated groups of features to effectively solve various machine learning problems. A method of feature importance evaluation for hyperspectral image data is proposed. This method combines iterative training of decision tree classifiers based on spectral features with matrix decomposition to overcome sparsity. Decision trees provide intrinsic feature selection mechanism but only a small number of features are usually taken into account by the CART algorithm for training a single decision tree classifier instance. Furthermore when features are highly correlated (e.g., Pearson ρ > 0.8), tree-based methods like Random Forest or XGBoost arbitrarily assign importance to one feature while suppressing others, as they redundantly capture the same signal. The considered method is compared with several tree based methods for feature importance evaluation such as vanilla Gini impurity decrease and more complicated Boruta algorithm. The features are highlighted using a classification algorithm for thick cloud classification based on the marked-up satellite data. Classification accuracy testing based on significant features is performed for different types of surfaces for the set of several single images.

Submission Number: 142

Loading