Abstract: Highlights•An inevitable gap exists between theoretical and practical Top-k sparsification.•DLS alters the sparsity ratio of each layer during the model training.•DLS is with both good performance and high training efficiency.•DLS(s) further reduces introduced overhead without performance degradation.•The performance is evaluated on four datasets and a wide variety of models.
Loading