A second-order-like optimizer with adaptive gradient scaling for deep learning

TMLR Paper5110 Authors

14 Jun 2025 (modified: 16 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. It leverages second-order information and rescaling while keeping the memory and compute requirements of standard DL methods as AdamW or SGD. INNAprop is evaluated on CIFAR-10, Food101, and ImageNet with ResNets, VGG, DenseNet, and ViT. We also train GPT-2 (OpenWebText) from scratch and with LoRA fine-tuning (E2E). INNAprop consistently matches or outperforms AdamW both in training speed and accuracy, with minimal hyperparameter tuning in large-scale settings.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Mathurin_Massias1
Submission Number: 5110
Loading