The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Blogpost Url: https://d2jud02ci9yv69.cloudfront.net/2025-04-28-the-lottery-llm-hyperthesis-217/blog/the-lottery-llm-hyperthesis/
Abstract: Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention of researchers. However, Current methodologies predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy, on tasks involving common sense knowledge question answering and basic arithmetic reasoning. In this blog, we present a brief review of the recent advancements of LLM related to retrieval augmented generation, multi-step reasoning, external tools and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance with the original LLM with the assistances of multi-step reasoning and external tools. Based on the review of current progresses of LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.
Conflict Of Interest: I have used three of my own papers to illustrate the energy cost of deep learning models and LLM compression. The papers are:
The Impact of GPU DVFS on the Energy and Performance of Deep Learning: An Empirical Study,
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs,
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models,
Thess citations are included solely to provide a concrete and relevant example for the discussion of energy consumption and LLM compression limitations. It is not intended to highlight or promote the work itself.
Submission Number: 97
Loading