Torque-Aware Momentum

Torque-Aware Momentum

TMLR Paper5519 Authors

31 Jul 2025 (modified: 21 Oct 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Efficiently exploring complex loss landscapes is key to the performance of deep neural networks. While momentum-based optimizers are widely used in state-of-the-art setups, classical momentum can still struggle with large, misaligned gradients, leading to oscillations. To address this, we propose Torque-Aware Momentum (TAM), which introduces a damping factor based on the angle between the new gradients and previous momentum, stabilizing the update direction during training. Empirical results show that TAM, which can be combined with both SGD and Adam, enhances exploration, handles distribution shifts more effectively, and improves generalization performance across various tasks, including image classification, large language model fine-tuning and continual learning, when compared to classical momentum-based optimizers.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Sanghyuk_Chun1

Submission Number: 5519

Loading