Abstract: The knowledge that data lies close to a particular submanifold of the ambient Euclidean
space may be useful in a number of ways. For
instance, one may want to automatically mark
any point far away from the submanifold as
an outlier or to use the geometry to come up
with a better distance metric. Manifold learning problems are often posed in a very high
dimension, e.g. for spaces of images or spaces
of words. Today, with deep representation
learning on the rise in areas such as computer
vision and natural language processing, many
problems of this kind may be transformed
into problems of moderately high dimension,
typically of the order of hundreds. Motivated
by this, we propose a manifold learning technique suitable for moderately high dimension
and large datasets. The manifold is learned
from the training data in the form of an intersection of quadric hypersurfaces—simple but
expressive objects. At test time, this manifold
can be used to introduce a computationally
efficient outlier score for arbitrary new data
points and to improve a given similarity metric by incorporating the learned geometric
structure into it.
Loading