Abstract: Stochastic gradient descent (SGD) is a simple and efficient method for solving large-scale stochastic optimization problems in machine learning. It has been shown that the convergence rate of SGD can be improved by α-suffix averaging technique, abbreviated as SGD-α. Classic analysis usually requires the assumption of unbiased gradient estimates, which is not suitable for many practical applications of SGD-α such as non-independently and identically distributed (non-i.i.d.) scenarios. Another limitation is that SGD-α needs to store all iterates in memory and thus cannot be implemented on-the-fly. To address the issues, we employ rounding technique to propose a real-time version of SGD-α (named SGD-rα), which can iteratively calculate the α-suffix averaging and has the same convergence rate as that of SGD-α. In particular, SGD-rα with biased gradient estimates can obtain sublinear convergence rate for strongly convex objectives. Numerical experiments on the benchmark datasets have shown the characteristics of SGD-rα and corroborated the theoretical results. The implementation of SGD-rα is available at:https://github.com/xudp100/SGD-ra.
Loading