A second-order-like optimizer with adaptive gradient scaling for deep learning

A second-order-like optimizer with adaptive gradient scaling for deep learning

TMLR Paper5110 Authors

14 Jun 2025 (modified: 10 Oct 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. It leverages second-order information and rescaling while keeping the memory and compute requirements of standard DL methods as AdamW or SGD. INNAprop is evaluated on CIFAR-10, Food101, and ImageNet with ResNets, VGG, DenseNet, and ViT. We also train GPT-2 (OpenWebText) from scratch and with LoRA fine-tuning (E2E). INNAprop consistently matches or outperforms AdamW both in training speed and accuracy, with minimal hyperparameter tuning in large-scale settings.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Added a supplement to show new experiments

Assigned Action Editor: ~Mathurin_Massias1

Submission Number: 5110

Loading