BurstEngine: An efficient distributed framework for training transformers On extremely Long sequences of over 1M tokens

Published: 2025, Last Modified: 22 Jan 2026SC 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading