MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models

Published: 2025, Last Modified: 19 Jan 2026PPoPP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading