Improved Inference for Imputation-Based Semisupervised Learning Under Misspecified Setting

Shaogao Lv, Linsen Wei, Qian Zhang, Bin Liu, Zenglin Xu

Published: 01 Jan 2022, Last Modified: 03 Feb 2025IEEE Trans. Neural Networks Learn. Syst. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Semisupervised learning (SSL) has been extensively studied in related literature. Despite its success, many existing learning algorithms for semisupervised problems require specific distributional assumptions, such as “cluster assumption” and “low-density assumption,” and thus, it is often hard to verify them in practice. We are interested in quantifying the effect of SSL based on kernel methods under a misspecified setting. The misspecified setting means that the target function is not contained in a hypothesis space under which some specific learning algorithm works. Practically, this assumption is mild and standard for various kernel-based approaches. Under this misspecified setting, this article makes an attempt to provide a theoretical justification on when and how the unlabeled data can be exploited to improve inference of a learning task. Our theoretical justification is indicated from the viewpoint of the asymptotic variance of our proposed two-step estimation. It is shown that the proposed pointwise nonparametric estimator has a smaller asymptotic variance than the supervised estimator using the labeled data alone. Several simulated experiments are implemented to support our theoretical results.