Abstract: To enable domain adaptation of AI on edge devices with fast convergence and low memory, we present a novel backpropagation-free optimization algorithm dubbed Target Projection Stochastic Gradient Descent (tpSGD). tpSGD uses layer-wise stochastic gradient descent (SGD) and local targets generated via random projections of the labels to train the network layer-by-layer with only forward passes. It doesn't require retaining gradients during optimization, thus greatly reducing memory allocation compared to SGD backpropagation (BP) methods. Compared to other target projection methods, tpSGD generalizes the concept to arbitrary local layer-wise loss functions. Our method performs comparably to BP gradient-descent within ~5% accuracy on relatively shallow networks of fully connected layers, convolutional layers, transformers, and recurrent layers. tpSGD also outperforms other state-of-the-art gradient-free algorithms with competitive accuracy and less memory and compute time.
Loading