Benefit of interpolation in nearest neighbor algorithms

Yue Xing, Qifan Song, Guang Cheng

12 May 2023 (modified: 12 May 2023)OpenReview Archive Direct UploadReaders: Everyone

Abstract: In some studies (e.g., [C. Zhang et al. in Proceedings of the 5th International Conference on Learning Representations, OpenReview.net, 2017]) of deep learning, it is observed that overparametrized deep neural networks achieve a small testing error even when the training error is almost zero. Despite numerous works toward understanding this so-called double-descent phenomenon (e.g., [M. Belkin et al., Proc. Natl. Acad. Sci. USA, 116 (2019), pp. 15849--15854; M. Belkin, D. Hsu, and J. Xu, SIAM J. Math. Data Sci., 2 (2020), pp. 1167--1180]), in this paper, we turn to another way to enforce zero training error (without overparametrization) through a data interpolation mechanism. Specifically, we consider a class of interpolated weighting schemes in the nearest neighbors (NN) algorithms. By carefully characterizing the multiplicative constant in the statistical risk, we reveal a U-shaped performance curve for the level of data interpolation in both classification and regression setups. This sharpens the existing result [M. Belkin, A. Rakhlin, and A. B. Tsybakov, in Proceedings of Machine Learning Research 89, PMLR, 2019, pp. 1611--1619] that zero training error does not necessarily jeopardize predictive performances and claims a counterintuitive result that a mild degree of data interpolation actually strictly improves the prediction performance and statistical stability over those of the (uninterpolated) 𝑘 -NN algorithm. In the end, the universality of our results, such as change of distance measure and corrupted testing data, will also be discussed.

0 Replies