Keywords: Loss of plasticity, warm-start, foundation model
TL;DR: Using the standard training scheme, the generalization loss does not occur when we train the model with warm-start scenario.
Abstract: As large-scale datasets grow, neural networks are increasingly trained in a sequential manner, raising concerns about plasticity loss—a reduced ability to adapt to new data. Prior studies suggest that warm-start training, which continues from previously trained model, yields worse generalization than cold-start training, which reinitializes models at each training phase. However, these works often ignore standard training schemes such as utilizing data augmentation. We revisit this problem under standard training schemes and show, through extensive experiments on various settings, and multiple downstream tasks, that warm-start does not harm generalization compared to cold-start. This finding holds for both training from scratch, fine-tuning the pre-trained model, and training the foundation models in warm-start scenario, suggesting that warm-starting is a robust and reliable strategy for large-scale neural network training.
Serve As Reviewer: ~Taesup_Moon1
Submission Number: 33
Loading