Minimization of Gini Impurity: NP-completeness and Approximation Algorithm via Connections with the $k$-means ProblemOpen Website

06 Apr 2023OpenReview Archive Direct UploadReaders: Everyone
Abstract: The Gini impurity is a very popular criterion to select attributes during decision trees construction. In the problem of finding a partition with minimum weighted Gini impurity (PMWGP), the one faced during the construction of decision trees, a set of vectors must be partitioned into $k$ different clusters such that the partition's overall Gini impurity is minimized. We show that PMWGP is APX-hard for arbitrary $k$ and admits a randomized PTAS when the number of clusters is fixed. These results significantly improve the current knowledge on the problem. The key idea to obtain these results is to explore connections between PMWGP and the geometric $k$-means clustering problem.
0 Replies

Loading