The revival of the Gini importance?

Stefano Nembrini; Inke R. König; Marvin N. Wright

The revival of the Gini importance?

Stefano Nembrini, Inke R. König, Marvin N. Wright

Published: 01 Jan 2018, Last Modified: 07 Oct 2024Bioinform. 2018EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Random forests are fast, flexible and represent a robust approach to analyze high dimensional data. A key advantage over alternative machine learning algorithms are variable importance measures, which can be used to identify relevant features or perform variable selection. Measures based on the impurity reduction of splits, such as the Gini importance, are popular because they are simple and fast to compute. However, they are biased in favor of variables with many possible split points and high minor allele frequency.

Loading