Towards Structured Sparsity in Transformers for Efficient Inference

Harry Dong; Beidi Chen; Yuejie Chi

Towards Structured Sparsity in Transformers for Efficient Inference

Harry Dong, Beidi Chen, Yuejie Chi

Published: 20 Jun 2023, Last Modified: 16 Jul 2023ES-FoMO 2023 PosterEveryoneRevisionsBibTeX

Keywords: Transformers, LLMs, Sparsity, Structured Sparsity, Activation, Regularization

TL;DR: We describe methods to create highly structured sparsity in transformers to improve their efficiency.

Abstract: Transformer models have been critical in accelerating progress in numerous fields, yet scaling these models come at high computational costs. In this paper, we explore sparsity properties in transformers and manipulate existing sparsity in transformers to be more structured for efficient training and inference. In particular, we create sparse structures that have inter-layer similarity and are block sparse which have the potential to bypass a significant amount of model loading and computation. We present preliminary results and ideas using a small transformer which we hope to extend to more complex models.

Submission Number: 43

Loading