TGRS: Teacher-Guided Rank-Sensitive Quantization for Large Language Models

ICLR 2026 Conference Submission14746 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language model, quantization, compression, edge deployment, budget-aware allocation
TL;DR: This paper proposes a novel compression framework for large language models, Teacher-Guided Rank Sensitivity (TGRS).
Abstract: Compression techniques such as quantization and low-rank approximation have enabled large language models (LLMs) to run on resource-constrained hardware, but they often fall short in capturing the heterogeneous sensitivity of model components. In this paper, we propose **Teacher-Guided Rank Sensitivity (TGRS)**, a novel LLM compression framework that uses a data-informed, direction-level sensitivity profiler to directly quantify parameter importance with respect to prediction accuracy and representational capacity. By projecting the importance signals onto the singular directions of weight matrices, TGRS establishes a principled scoring mechanism that drives a global, budget-aware allocation strategy. The allocator dynamically determines where low-rank corrections are most effective, preserving expressivity in sensitive regions while aggressively compressing redundant ones. TGRS requires no retraining or custom kernels, yet consistently outperforms prior methods. Most notably, on LLaMA-3.1-8B at an aggressive 3.6-bit budget, TGRS achieves 4.4$\times$ compression with minimal perplexity degradation from 6.63 to 6.78 (+0.15). By reducing memory requirements from 16GB to 3.9GB, models can be easily deployed on edge devices like Jetson Orin Nano (8GB memory).
Supplementary Material: zip
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 14746
Loading