Keywords: lattices, representation learning, coding theory, lossy source coding, information theory
TL;DR: We propose to use lattices to represent objects and prove a fundamental result on how to train networks that use them.
Abstract: We introduce the notion of \emph{lattice representation learning}, in which the representation for some object of interest (e.g. a sentence or an image) is a lattice point in an Euclidean space. Our main contribution is a result for replacing an objective function which employs lattice quantization with an objective function in which quantization is absent, thus allowing optimization techniques based on gradient descent to apply; we call the resulting algorithms \emph{dithered stochastic gradient descent} algorithms as they are designed explicitly to allow for an optimization procedure where only local information is employed. We also argue that a technique commonly used in Variational Auto-Encoders (Gaussian priors and Gaussian approximate posteriors) is tightly connected with the idea of lattice representations, as the quantization error in good high dimensional lattices can be modeled as a Gaussian distribution. We use a traditional encoder/decoder architecture to explore the idea of latticed valued representations, and provide experimental evidence of the potential of using lattice representations by modifying the \texttt{OpenNMT-py} generic \texttt{seq2seq} architecture so that it can implement not only Gaussian dithering of representations, but also the well known straight-through estimator and its application to vector quantization.
Code: https://www.dropbox.com/s/y6pvq34xh7tkory/ICLR_SUBMISSION.zip?dl=0
Original Pdf: pdf
7 Replies
Loading