Abstract: In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive filtering. When training complex models on large datasets, the choice of optimizer parameters, particularly the learning rate, is crucial to avoid divergence. Our algorithm updates the network weights using stochastic gradient descent with $\ell_1$ and $\ell_2$-based normalizations applied to the learning rate, similar to NLMS. However, unlike existing normalization methods, we exclude the error term from the normalization process and instead normalize the update term using the input vector to the neuron. Our experiments demonstrate that our optimization algorithm achieves higher accuracy levels compared to different initialization settings. We evaluate the efficiency of our training algorithm on benchmark datasets using ResNet-20, WResNet-18, ResNet-50, and a toy neural network. Our INSGD algorithm improves the accuracy of ResNet-20 on CIFAR-10 from 92.55\% to 92.80\%, the accuracy of MobileNetV3 on CIFAR-10 from 90.83\% to 91.13\%, WResNet-18 on CIFAR-100 from 78.75\% to 78.85\%, and ResNet-50 on ImageNet-1K from 75.56\% to 75.89\%.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=9PQ7J7a3OS
Changes Since Last Submission: The paper was desk-rejected because this link includes the name of one of the authors. The manuscript was providing the GitHub link. The name is removed from the manuscript. However, the code is not added as supplementary materials due to the file size limit.
Assigned Action Editor: ~Konstantin_Mishchenko1
Submission Number: 1464
Loading