Microarchitecture Is Destiny: Performance and Accuracy of Quantized LLMs on Consumer Hardware

ICLR 2026 Conference Submission8962 Authors

17 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models (LLMs), Quantization, Post-Training Quantization (PTQ), Consumer Hardware, GPU Microarchitecture, Tensor Cores, Inference Performance, System-level Bottlenecks, Quantization as Regularization, System-Aware Evaluation
Abstract: While the deployment of out-of-the-box quantization on consumer-grade hardware is widespread, its impact on Large Language Models (LLMs) reveals a complex, twofold phenomenon that questions the prevailing assumption. This study presents a rigorous empirical evaluation across four generations of NVIDIA GPUs, uncovering two core, often counter-intuitive, findings. First, contrary to the prevailing view that quantization universally degrades performance on complex tasks, the analysis demonstrates that for large models (14B+ parameters), popular 8-bit and 4-bit quantization schemes can yield substantial accuracy improvements on mathematical reasoning benchmarks compared to their 16-bit floating-point counterparts, which suffer from system-level bottlenecks in resource-constrained environments. Second, the investigation reveals that for smaller models prone to overfitting, the noise introduced by these same quantization schemes can act as an effective computational regularizer, unexpectedly enhancing generalization. The performance analysis further establishes that once VRAM capacity is met, the GPU microarchitecture's support for low-precision integer arithmetic, rather than VRAM size, becomes the primary determinant of inference throughput. These findings provide a more nuanced perspective that moves beyond a simplistic trade-off, offering practitioners an evidence-based framework for navigating the interplay between model scale, hardware capabilities, and reasoning fidelity.
Primary Area: datasets and benchmarks
Supplementary Material: zip
Submission Number: 8962
Loading