KLong: Training LLM Agent for Extremely Long-horizon Tasks

Published: 02 Mar 2026, Last Modified: 10 Apr 2026LLA 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, LLM Agent, Extremely Long-horizon Task, Agentic Reinforcement Learning
Abstract: This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a comprehensive SFT recipe. Then, we introduce Research-Factory, an automated pipeline that generates high-quality training data by collecting research papers and constructing evaluation rubrics. Using this pipeline, we build thousands of long-horizon trajectories distilled from the frontier model. To train with these extremely long trajectories, we propose trajectory-splitting SFT, which preserves early context, progressively truncates later context, and maintains overlap between sub-trajectories. In addition, to further improve long-horizon task-solving capability, we propose progressive RL, which schedules training into multiple stages with progressively extended timeouts. Extensive experiments demonstrate the superiority and generalization of KLong, as shown in Figure 1. Notably, our KLong (106B) surpasses Kimi K2 Thinking (1T) by 11.28\% on PaperBench, and the performance improvement generalizes to other coding benchmarks, such as SWE-bench Verified and MLE-bench.
Submission Number: 13
Loading