Abstract: In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation. They proved that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. This paper shows that this fact still holds for DEQs with any general activation that has bounded first and second derivatives. Since the new activation function is generally non-homogeneous, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging. To accomplish this task, we need to create a novel population Gram matrix and develop a new form of dual activation with Hermite polynomial expansion.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Dear Reviewers,
Based on your comments and suggestions, we have revised our paper. The major changes in this revised version are as follows:
+ Added a weight initialization algorithm and demonstrated that it will terminate with high probability.
+ Corrected typos and addressed imprecisions in the paper.
+ Improved the presentation by removing equations that are not referenced later in the text
Looking forward to receiving your feedback.
Assigned Action Editor: ~Nadav_Cohen1
Submission Number: 3442
Loading