WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale

WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale

ICLR 2026 Conference Submission19517 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: long-context, synthetic data, LLM

TL;DR: WildLong is a framework for scalably generating diverse, realistic long-context instruction datasets that boost LLMs’ performance on long-context tasks.

Abstract: Large language models (LLMs) with extended context windows enable tasks requiring extensive information integration but are limited by the scarcity of high-quality, diverse datasets for long-context instruction tuning. Existing data synthesis methods focus narrowly on objectives like fact retrieval and summarization, restricting their generalizability to complex real-world tasks. We introduce WildLong, a framework for generating diverse, scalable, and realistic instruction-response datasets tailored to long-context tasks. WildLong extracts meta-information from real user queries, models co-occurrence relationships via graph-based methods, and employs adaptive generation to produce scalable data. It extends beyond single-document tasks to support multi-document reasoning, such as cross-document comparison and aggregation. Our models, finetuned on 150K instruction-response pairs synthesized using WildLong, surpasses existing open-source long-context-optimized models across benchmarks while maintaining strong performance on short-context tasks without incorporating supplementary short-context data. By generating a more diverse and realistic long-context instruction dataset, WildLong enhances LLMs' ability to generalize to complex, real-world reasoning over long contexts, establishing a new paradigm for long-context instruction tuning.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 19517

Loading