Stochastic Fractional Gradient Descent with Caputo $L_1$ Scheme for Deep Neural Networks

Stochastic Fractional Gradient Descent with Caputo $L_1$ Scheme for Deep Neural Networks

TMLR Paper2325 Authors

04 Mar 2024 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Stochastic gradient descent (SGD) has been used as a standard method to optimize deep neural networks (DNNs), where it essentially deals with first-order derivatives. Incorporating fractional derivatives into learning algorithms is expected to improve model performance, especially when the corresponding optimization problems involve objective functions with memory effects or long-range dependencies. The Caputo derivative is a fractional derivative that maintains consistency with integer-order calculus and produces more reliable solutions than other fractional derivatives, especially for differential equations. In this paper, we propose a novel Caputo-based SGD algorithm tailored for training DNNs. Our method exploits the Caputo $L_1$ scheme to achieve highly effective training and accurate prediction for large data by using gradient information from its past history to guide parameter updates in a more informed direction. This allows it to avoid local minima and saddle points, resulting in faster convergence to the target value. We conducted experiments on several benchmark datasets to evaluate our method. The results show that our method can improve the empirical performance over some traditional optimization methods in both accuracy and convergence.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Simon_Lacoste-Julien1

Submission Number: 2325

Loading