AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism

AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism

ICLR 2026 Conference Submission22044 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Long context training, Sequence Parallelism

TL;DR: An automated approach for lifting Sequence Parallelism, and other targeted memory optimisations for long-context training, into the compiler.

Abstract: Large-language-models (LLMs) demonstrate enormous utility in long-context tasks which require processing prompts that consist of tens to hundreds of thousands of tokens. However, existing LLM training libraries do not provide easy to use abstractions to optimize for long-context training, instead focusing on optimizations for models with large parameter counts through ZeRO-3/FSDP, Tensor and Pipeline parallelism. This forces users to rewrite LLM training libraries to incorporate compositions of various complex long-context optimizations, such as sequence-parallelism, to training pipelines; a process that requires in-depth expertise, reducing developer productivity. To tackle these challenges, we introduce AutSP: the first automated solution to automatically optimize LLM training for longer-contexts. AutoSP compiles models and applies a targeted set of optimizations: automated sequence parallelism, and long-context aware activation-checkpointing, to drastically enhance LLM trainability at negligible cost to throughput. Our evaluation demonstrates AutoSP's capability on both NVIDIA and AMD hardware, increasing training contexts by upto 2.7$\times$ and 2.5$\times$ respectively at negligible cost to runtime performance over competitive hand-written baselines.

Primary Area: infrastructure, software libraries, hardware, systems, etc.

Submission Number: 22044

Loading