Gradient Descent Learning With Floats

Tao Sun, Ke Tang, Dongsheng Li

Published: 2022, Last Modified: 14 May 2023IEEE Trans. Cybern. 2022Readers: Everyone

Abstract: The gradient learning descent method is the main workhorse of training tasks in artificial intelligence and machine-learning research. Current theoretical studies of gradient descent only use the continuous domains, which is unreal since electronic computers use the float point numbers to store and deal with data. Although existing results are sufficient for the extremely tiny errors in high-precision machines, they need to be improved for low-precision cases. This article presents an understanding of the learning algorithm in computers with floats. The performances of three gradient descents with the floating domain are investigated when the objective function is smooth. When the function is assumed to have the PŁ condition, the convergence speed can be improved. We proved that for floating gradient descent to obtain an error with <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula> , the iteration is <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$O(1/\epsilon)$ </tex-math></inline-formula> for the general smooth case, and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$O(\ln (1/\epsilon))$ </tex-math></inline-formula> for the PŁ case. But <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula> should be larger than the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$s$ </tex-math></inline-formula> -bit machine epsilon <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\delta (s)$ </tex-math></inline-formula> in the deterministic case, that is, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon \geq \Omega (\delta (s))$ </tex-math></inline-formula> , while <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon \geq \Omega (\sqrt {\delta (s)})$ </tex-math></inline-formula> for the stochastic case. Floating stochastic and sign gradient descents can both output an <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula> noised result in <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$O(1/\epsilon ^{2})$ </tex-math></inline-formula> iterations.

0 Replies