A New Concentration Inequality for Sampling Without Replacement and Its Application for Transductive Learning
TL;DR: We present a new concentration inequality for the supremum of the empirical process associated with sampling without replacement, and apply our new concentration inequality to prove sharper generalization bound for transductive kernel learning.
Abstract: We introduce a new tool, Transductive Local Complexity (TLC), to analyze the generalization performance of transductive learning methods and motivate new transductive learning algorithms. Our work extends the idea of the popular Local Rademacher Complexity (LRC) to the transductive setting with considerable and novel changes compared to the analysis of typical LRC methods in the inductive setting. While LRC has been widely used as a powerful tool in the analysis of inductive models with sharp generalization bounds for classification and minimax rates for nonparametric regression, it remains an open problem whether a localized version of Rademacher complexity based tool can be designed and applied to transductive learning and gain sharp bound for transductive learning which is consistent with the inductive excess risk bound by (LRC). We give a confirmative answer to this open problem by TLC. Similar to the development of LRC, we build TLC by first establishing a novel and sharp concentration inequality for supremum of empirical processes for the gap between test and training loss in the setting of sampling uniformly without replacement. Then a peeling strategy and a new surrogate variance operator are used to derive the following excess risk bound in the transductive setting, which is consistent with that of the classical LRC based excess risk bound in the inductive setting.
As an application of TLC, we use the new TLC tool to analyze the Transductive Kernel Learning (TKL) model, and derive sharper excess risk bound than that by the current state-of-the-art. As a result of independent interest, the concentration inequality for the test-train process is used to derive a sharp concentration inequality for the general supremum of empirical process involving random variables in the setting of sampling uniformly without replacement, with comparison to current concentration inequalities.
Lay Summary: When we train machine learning models, we usually assume we have one dataset to train on and a separate, unseen dataset to test on. But in many real-world scenarios, we already know the test data, we just don’t have their labels yet. This setting is called transductive learning.
Our work introduces a new mathematical tool called Transductive Local Complexity (TLC) to better understand and improve how models generalize in this setting. This builds on a successful idea from traditional learning theory called Local Rademacher Complexity, but adapting it to transductive learning posed major challenges that hadn’t been solved — until now.
We not only answer this open question, but also apply TLC to analyze a popular algorithm called Transductive Kernel Learning, showing it can perform better than previous methods. Along the way, we also developed sharper mathematical techniques that could benefit other areas of machine learning and statistics.
Primary Area: Theory->Learning Theory
Keywords: Concentration Inequality, Sampling Without Replacement, Transductive Local Rademacher Complexity, Transductive Learning
Submission Number: 13647
Loading