Keywords: LLM, Training Efficiency, Distributed Computing, Triton
Abstract: Training large language models (LLMs) efficiently at scale remains challenging due to rising compute and memory demands. We present Liger‑Kernel, an open‑source Triton kernel suite for core LLM primitives and diverse loss functions (pre-training, SFT, distillation, alignment, RLHF). Each kernel uses aggressive operator fusion, in‑place gradient computation, and, where advantageous, input chunking to curb memory traffic and kernel‑launch overhead. On widely used LLMs, these optimizations boost throughput by ~20% and cut GPU memory consumption by ~60% versus Hugging Face baselines. The code is available under a permissive license at https://github.com/linkedin/Liger-Kernel.
Submission Number: 4
Loading