Plug & Play Sequence Parallelism for Long Post-Training

Plug & Play Sequence Parallelism for Long Post-Training

ACL ARR 2025 May Submission1686 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Nowadays, large language models (LLMs) are expected to process inputs of unprecedented length, with current token limits extending to several million tokens. To meet the long-sequence demands of LLMs, fine-tuning frameworks must also support post-training on extended sequences. Based on the LLaMA-Factory framework, we implemented multiple sequence parallelism (DeepSpeed-Ulysses and Ring-Attention), provided feasible support for sequence parallelism of long sequences. Meanwhile, we extended DeepSpeed-Ulysses by adding dummy heads to handle cases where the number of attention heads is not divisible by the sequence parallel size. At the same time, we conducted an in-depth analysis of the practical issues and potential errors of applying sequence parallelism to post-training. Finally, we experimentally validated the correctness of our sequence parallelism implementation and demonstrated the efficiency of our Dummy-Head Ulysses. We also compared different sequence parallel strategies in terms of maximum sequence length and runtime efficiency. Our code is open at https://anonymous.4open.science/r/SP-LLaMA-Factory-B8B1.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: LLM Efficiency, NLP in resource-constrained settings

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models

Languages Studied: common to all languages

Submission Number: 1686

Loading