Liger-Kernel: Efficient Triton Kernels for LLM Training

Pin-Lun Hsu; Yun Dai; Vignesh Kothapalli; Qingquan Song; Shao Tang; Siyu Zhu; Steven Shimizu; Shivam Sahni; Haowen Ning; Yanning Chen; Zhipeng Wang

Liger-Kernel: Efficient Triton Kernels for LLM Training

Pin-Lun Hsu, Yun Dai, Vignesh Kothapalli, Qingquan Song, Shao Tang, Siyu Zhu, Steven Shimizu, Shivam Sahni, Haowen Ning, Yanning Chen, Zhipeng Wang

Published: 09 Jun 2025, Last Modified: 14 Jul 2025CODEML@ICML25EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Training Efficiency, Distributed Computing, Triton

Abstract: Training large language models (LLMs) efficiently at scale remains challenging due to rising compute and memory demands. We present Liger‑Kernel, an open‑source Triton kernel suite for core LLM primitives and diverse loss functions (pre-training, SFT, distillation, alignment, RLHF). Each kernel uses aggressive operator fusion, in‑place gradient computation, and, where advantageous, input chunking to curb memory traffic and kernel‑launch overhead. On widely used LLMs, these optimizations boost throughput by ~20% and cut GPU memory consumption by ~60% versus Hugging Face baselines. The code is available under a permissive license at https://github.com/linkedin/Liger-Kernel.

Submission Number: 4

Loading