Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator

TMLR Paper2676 Authors

12 May 2024 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Nonparametric classification does not assume a particular functional form of the underlying data distribution, being a suitable approach for a wide variety of data sets. The $k$-nearest neighbor ($k$-NN) algorithm is one of the most popular methods for nonparametric classification. However, a relevant limitation concerns the definition of the number of neighbors $k$. This parameter exerts a direct impact on several properties of the classifier, such as the bias-variance tradeoff, smoothness of decision boundaries, robustness to noise, and class imbalance handling. In the present paper, we propose a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to automatically define the neighborhood size. The rationale is that points with low curvature could have larger neighborhoods (locally, the tangent space approximates well the underlying data shape), whereas points with high curvature could have smaller neighborhoods (locally, the tangent space is a loose approximation). We estimate the local Gaussian curvature by computing an approximation to the local shape operator in terms of the local covariance matrix as well as the local Hessian matrix. Results on many real-world data sets indicate that the new $kK$-NN algorithm may provide superior balanced accuracy compared to the established $k$-NN method. This is particularly evident when the number of samples in the training data is limited, suggesting that the $kK$-NN is capable of learning more discriminant functions with less data in some relevant cases.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Benjamin_Guedj1
Submission Number: 2676
Loading