A Framework for Hypothesis Learning Over Sets of Vectors

Karim Abou-Moustafa, Frank Ferrie

30 Jan 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: Sets of vectors, or bags of features, are a common data representation in domains such as computer vision and speech recognition. However, learning a hypothesis (classification, clustering, etc.) over sets of vectors is usually hindered by their particular structure, in which each object in a data set is represented by a different number of vectors of fixed dimensionality. This nonuniform format of the input data requires the learning algorithm to implicitly handle this non-regular type of input, either by unifying the format of the input, or by extracting the necessary information out of it. In this paper we propose an unsupervised learning frame- work for unifying the representation of sets of vectors. The framework defines a metric space over probability distributions representing the sets of vectors, followed by a spectral embedding step for these distributions. The spectral embed- ding step offers an implicit clustering for the data, combined with a reduction – by orders of magnitude – in the data’s space complexity, resulting in significantly faster hypothesis learning over the sets of vectors. Moreover, it allows the framework to easily generalize to out-of-sample examples using the Nystrom formula. Although the framework is application independent, we test its validity in the context of human action recognition from video sequences. Besides the previously mentioned properties, the framework does in- deed show better performance than other approaches in the literature.

0 Replies