EfficientSkip: Efficiently Transforming Dense LLMs into Sparse Variants

Yang Song; Wei Li; Yang You

EfficientSkip: Efficiently Transforming Dense LLMs into Sparse Variants

Yang Song, Wei Li, Yang You

26 Sept 2024 (modified: 26 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: efficient LLM, skip token, conditional computation

Abstract:

Transformer-based LLMs achieve great success on a variety of NLP tasks, including machine translation, text summarization, and text generation. However, it requires huge amount of computation and data to train such a powerful LLM. Researchers have proposed transformer-based conditional computation algorithms that significantly reduce redundant computations on certain tokens. By skipping dense attention and feed forward computations, these approaches yield sparse LLMs. However, these sparse LLMs are trained from scratch, requiring substantial computation and data. Therefore in this paper, we proposed a training paradigm that can effectively transform a dense transformer-based LLM to its sparse variant with very limited computation resources and merely millions of tokens. We conducted thorough investigations into the key factors that may influence the dense-to-sparse transformation through numerous empirical experiments. In addition, we conducted a case study on the how the tokens skip layers and analyzed their Part-of-Speech tags, gaining valuable insights.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6793

Loading