Abstract: Highlights•We study a label-free importance score for structured pruning of autoregressive Transformers.•We propose an adaptive retraining approach for pruned Transformer models of varying sizes.•Our pruned model achieve up to 60% reduction in size with only ¡2.4% drop in accuracy.
Loading