MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models

Elias Frantar, Roberto L. Castro, Jiale Chen, Torsten Hoefler, Dan Alistarh

Published: 28 Feb 2025, Last Modified: 19 Jan 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Loading