PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation

Alexandre Piché, Ehsan Kamalloo, Rafael Pardinas, Xiaoyin Chen, Dzmitry Bahdanau

Published: 2026, Last Modified: 30 May 2026Trans. Mach. Learn. Res. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading