Contemporary Model Compression on Large Language Models Inference

Dong Liu

Contemporary Model Compression on Large Language Models Inference

Dong Liu

Published: 01 Jan 2024, Last Modified: 03 Mar 2025CoRR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper focuses on modern efficient training and inference technologies on foundation models and illustrates them from two perspectives: model and system design. Model and System Design optimize LLM training and inference from different aspects to save computational resources, making LLMs more efficient, affordable, and more accessible. The paper list repository is available at https://github.com/NoakLiu/Efficient-Foundation-Models-Survey.

Loading