On the theoretical limit of gradient descent for Simple Recurrent Neural Networks with finite precision

Published: 19 Dec 2024, Last Modified: 19 Dec 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Despite their great practical successes, the understanding of neural network behavior is still a topical research issue. In particular, the class of functions learnable in the context of a finite precision configuration is an open question. In this paper, we propose to study the limits of gradient descent when such a configuration is set for the class of Simple Recurrent Networks (SRN). We exhibit conditions under which the gradient descend will provably fail. We also design a class of SRN based on Deterministic finite State Automata (DFA) that fulfills the failure requirements. The definition of this class is constructive: we propose an algorithm that, from any DFA, constructs a SRN that computes exactly the same function, a result of interest by its own.
Submission Length: Long submission (more than 12 pages of main content)
Video: https://www.youtube.com/watch?v=ap6LOok_Vtk&ab_channel=VolodimirMitarchuk
Code: https://github.com/23Vladymir57/TMLR_Code
Assigned Action Editor: ~Lechao_Xiao2
Submission Number: 3124
Loading