everyone
since 07 Nov 2023">EveryoneRevisionsBibTeX
Weight decay is a broadly used technique for training state-of-the-art deep networks. Despite its widespread usage, its role remains poorly understood. In this work, we highlight that the role of weight decay in modern deep learning is different from its regularization effect studied in classical learning theory. For overparameterized deep networks, we show how weight decay modifies the optimization dynamics enhancing the ever-present implicit regularization of SGD via loss stabilization.