Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models

Published: 04 Mar 2024, Last Modified: 16 May 2024ME-FoMo 2024 PosterEveryoneRevisionsBibTeX
Keywords: Data Pruning, LLM, Transformer, Perplexity, Efficiency, Pretraining
Abstract: In this work, we consider whether pretraining on a pruned high-quality subset of a large-scale text dataset can improve LLM performance. While existing work has shown that pruning based on the perplexity of a larger model can yield high-quality data, we investigate whether smaller models can be used for perplexity-based pruning and how pruning is affected by the domain composition of the data being pruned. We demonstrate that for multiple dataset compositions, perplexity-based pruning of pretraining data can \emph{significantly} improve downstream task performance: pruning based on perplexities computed with a 125 million parameter model improves the average accuracy of downstream tasks of a 3 billion parameter model by up to 1.35\% and achieves up to a $1.36\times$ reduction in pretraining steps to reach commensurate baseline performance.
Submission Number: 54
Loading