Liger-Kernel: Efficient Triton Kernels for LLM Training

Published: 09 Jun 2025, Last Modified: 14 Jul 2025CODEML@ICML25EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Training Efficiency, Distributed Computing, Triton
Abstract: Training large language models (LLMs) efficiently at scale remains challenging due to rising compute and memory demands. We present Liger‑Kernel, an open‑source Triton kernel suite for core LLM primitives and diverse loss functions (pre-training, SFT, distillation, alignment, RLHF). Each kernel uses aggressive operator fusion, in‑place gradient computation, and, where advantageous, input chunking to curb memory traffic and kernel‑launch overhead. On widely used LLMs, these optimizations boost throughput by ~20% and cut GPU memory consumption by ~60% versus Hugging Face baselines. The code is available under a permissive license at https://github.com/linkedin/Liger-Kernel.
Submission Number: 4
Loading