Keywords: PAC learning, surrogate losses, proper composite losses, multiclass classification, multi-label prediction, subset ranking, generalized linear models
Abstract: A central question in the theory of machine learning concerns the identification of classes of data distributions for which one can provide computationally efficient learning algorithms with provable statistical learning guarantees. Indeed, in the context of probably approximately correct (PAC) learning, there has been much interest in exploring intermediate PAC learning models that, unlike the realizable PAC learning setting, allow for some stochasticity in the labels, and unlike the fully agnostic PAC learning setting, also admit computationally efficient learning algorithms with finite sample complexity bounds. Some examples of such models include random classification noise (RCN), probabilistic concepts, Massart noise, and generalized linear models (GLMs); in general, most of this work has focused on binary classification problems. In this paper, we study what we call realizable-statistic models (RSMs), wherein we allow stochastic labels but assume that some vector-valued statistic of the conditional label distribution comes from some known function class. RSMs are a flexible class of models that interpolate between the realizable and fully agnostic settings, and that also recover several previously studied models as special cases. We show that for a broad range of RSM learning problems, where the statistic of interest can be accurately estimated via a convex ‘strongly proper composite’ surrogate loss, minimizing this convex surrogate loss yields a computationally efficient learning algorithm with finite sample complexity bounds. We then apply this result to show that various commonly used (and in some cases, not so commonly used) convex surrogate risk minimization algorithms yield computationally efficient learning algorithms with finite sample complexity bounds for a variety of RSM learning problems including binary classification, multiclass
classification, multi-label prediction, and subset ranking. For the special case of binary classification with sigmoid-of-linear class probabilities (also a special case of GLMs), our results show that minimizing the standard binary logistic loss has a similar sample complexity as the GLM-tron algorithm of Kakade et al. (2011), but is computationally more efficient. In terms of the distribution over
the domain/instance space, our results are all distribution-independent. To our knowledge, these are the first such results for PAC learning with stochastic labels for such a broad range of learning problems.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 11113
Loading