Keywords: Large Language Models; Model Compression; Structured Pruning; Kernel Space
TL;DR: This paper proposes a layer pruning method called Kernelized Dynamics Pruning (KDP), which simplifies the non-linear transformations between an LLM's consecutive layers by projecting them into a kernel space where they become approximately linear.
Abstract: This paper proposes Kernelized Dynamics Pruning (KDP), a novel layer pruning method from the perspective of simplifying representation dynamics within large language models (LLMs). Motivated by the high similarity between consecutive layer representations, we view the LLM's forward pass as a discrete-time dynamical system. We speculate that this phenomenon indicates the model's internal dynamics have entered a ``slow manifold'', which exhibits computational redundancy. Based on this insight, we project the representations into a kernel space where the complex, non-linear transformation between them is simplified to an approximately linear one. Then, a simple network learns the inverse kernel transformation, thereby enabling the pruning of the entire layer block. Both theoretical analysis and extensive experiments validate the effectiveness of KDP, demonstrating its superiority over existing pruning baselines.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 8414
Loading