Multi-label Learning for Large Text Corpora using Latent Variable Model with Provable Gurantees

Sayantan Dasgupta

Multi-label Learning for Large Text Corpora using Latent Variable Model with Provable Gurantees

Sayantan Dasgupta

15 Feb 2018 (modified: 15 Feb 2018)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: Here we study the problem of learning labels for large text corpora where each document can be assigned a variable number of labels. The problem is trivial when the label dimensionality is small and can be easily solved by a series of one-vs-all classifiers. However, as the label dimensionality increases, the parameter space of such one-vs-all classifiers becomes extremely large and outstrips the memory. Here we propose a latent variable model to reduce the size of the parameter space, but still efficiently learn the labels. We learn the model using spectral learning and show how to extract the parameters using only three passes through the training dataset. Further, we analyse the sample complexity of our model using PAC learning theory and then demonstrate the performance of our algorithm on several benchmark datasets in comparison with existing algorithms.

Keywords: Spectral Method, Multi-label Learning, Tensor Factorisation

4 Replies

Loading