DiffSkip: Differential Layer Skipping in Large Language Models

DiffSkip: Differential Layer Skipping in Large Language Models

ACL ARR 2025 February Submission5989 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Existing Large Language Models (LLMs) enforce uniform computation across all tokens. We analyze the correlation between the input-output difference of self-attention block and Feed-Forward Network (FFN) within the same transformer layer, and find that these two differential vectors are highly correlated. Thus, we propose to dynamically skip the FFN blocks based on the self-attention difference and introduce Diffential Layer Skipping (DiffSkip) to show that LLMs are inherently dynamic-depth models, capable of adjusting computational depth when generating different tokens. DiffSkip employs a lightweight router module to dynamically skip a set of FFN blocks in LLMs and only requires efficient fine-tuning while keeping the whole LLM frozen. Experimental results demonstrate that DiffSkip effectively enables dynamic FFN skipping in decoder-only language models, even in continuous token generation tasks where many layer-skipping methods struggle.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: Language Modeling, Efficient/Low-Resource Methods for NLP, Interpretability and Analysis of Models for NLP

Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 5989

Loading