Abstract: Stochastic gradient descent (SGD) has been used as a standard method to optimize deep neural networks (DNNs), where it essentially deals with first-order derivatives. Incorporating fractional derivatives into learning algorithms is expected to improve model performance, especially when the corresponding optimization problems involve objective functions with memory effects or long-range dependencies. The Caputo derivative is a fractional derivative that maintains consistency with integer-order calculus and produces more reliable solutions than other fractional derivatives, especially for differential equations. In this paper, we propose a novel Caputo-based SGD algorithm tailored for training DNNs. Our method exploits the Caputo $L_1$ scheme to achieve highly effective training and accurate prediction for large data by using gradient information from its past history to guide parameter updates in a more informed direction. This allows it to avoid local minima and saddle points, resulting in faster convergence to the target value. We conducted experiments on several benchmark datasets to evaluate our method. The results show that our method can improve the empirical performance over some traditional optimization methods in both accuracy and convergence.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Simon_Lacoste-Julien1
Submission Number: 2325
Loading