Mitigating Text Degeneration via Token-Level Guidance For pruned Large Language Models

12 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language model, Text Degnertaion, Pruning, Distillation
TL;DR: Surpressing text degeneration for pruned large language model
Abstract: Large language models (LLMs) suffer from substantial memory and inference costs, and pruning has emerged as a widely adopted strategy for compression. However, while pruning effectively reduces model size and latency, it often exacerbates undesirable side effects such as text degeneration, particularly repetition, even when perplexity remains largely intact. We observe that standard post-pruning fine-tuning is insufficient to suppress repetition, motivating the need for more targeted approaches. To address this issue, we propose two token-level guidance methods: FOCUS and RePAIR. FOCUS adjusts token probabilities through token-weighted distillation, focusing on high-confidence regions to better align the student with the teacher while reducing the likelihood of undesirable tokens. In contrast, RePAIR employs contrastive training with negative and positive samples to explicitly encourage the generation of alternative tokens. Experiments on open-ended and instruction-based generation tasks demonstrate that our methods substantially reduce repetition and improve generation diversity, while causing only minimal impact on perplexity. Furthermore, our methods are compatible with other training strategies and consistently enhance their performance. Code will be available soon.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 4538
Loading