OPEN-SWE-TRACES: Advancing Dual-Mode Multilingual Distillation for Software Engineering Agents

ACL ARR 2026 May Submission13674 Authors

26 May 2026 (modified: 02 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Autonomous Software Engineering, Agentic Trajectories, Hybrid-Reasoning Synthesis, LLMs, Offline Distillation, SWE-bench
Abstract: The path toward autonomous software engineering is currently bottlenecked by a severe deficit of diverse, large-scale trajectory data. We address this by introducing \ourdataset, an expansive dataset of 207,489 agentic trajectories spanning nine programming languages (Python, Go, TS, JS, Rust, Java, PHP, C, C++). Sourced from 20,000 real-world PRs via OpenHands and SWE-agent harnesses, the dataset utilizes a hybrid-reasoning synthesis: Minimax-M2.5 generates trajectories with explicit "thinking" processes, while Qwen3.5-122B provides high-quality behavioral traces. Filtered for permissive licenses (MIT, Apache, BSD) from SWE-rebench-V2, this data facilitates the training of models capable of long-horizon reasoning. We validate the dataset by fine-tuning the Qwen3-30B-A3B series (Thinking, Coder, and Instruct). The best performing model achieves a resolve rates of 60\% on SWE-bench Verified, 46.2\% on SWE-bench Multilingual, and 36.8\% on SWE-bench Pro. These results establish \ourdataset as a premier resource for distilling human-level software engineering capabilities into efficient, open-source agentic LLMs.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: corpus creation, benchmarking, evaluation.
Contribution Types: Data resources
Languages Studied: Coding languages
EMNLP 2026 AI Reviewing Experiment: no
Submission Number: 13674
Loading