Abstract: Development systems for deep learning, such as Theano, Torch,
TensorFlow, or MXNet, are easy-to-use tools for creating complex
neural network models. Since gradient computations are automatically
baked in, and execution is mapped to high performance hardware, these
models can be trained end-to-end on large amounts of data. However, it
is currently not easy to implement many basic machine learning
primitives in these systems (such as Gaussian processes, least squares
estimation, principal components analysis, Kalman smoothing), mainly
because they lack efficient support of linear algebra primitives as
differentiable operators. We detail how a number of matrix
decompositions (Cholesky, LQ, symmetric eigen) can be implemented as
differentiable operators. We have implemented these primitives in
MXNet, running on CPU and GPU in single and double precision. We
sketch use cases of these new operators, learning Gaussian process and
Bayesian linear regression models. Our implementation is based on
BLAS/LAPACK APIs, for which highly tuned implementations are available
on all major CPUs and GPUs.
TL;DR: We implement Cholesky factorization, LQ factorization, symmetric eigendecomposition, and others as differentiable operators in MXNet, enabling machine learning algorithms in there (Gaussian processes, Bayesian linear regression, Kalman filtering)
Keywords: linear algebra, reverse mode differentiation, mxnet
5 Replies
Loading