Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs

Published: 01 Jan 2024, Last Modified: 09 May 2025USENIX ATC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading