# INFO

created by: anno_0
last modified: 3/4/2025
description:

This data was generated by the same process as the paper:

Scaling Instruction-tuned LLMs to million token contexts via hierarchical synthetic data generation
https://openreview.net/pdf?id=BkwCrIsTbR

Qwen2-72B-Instruct was used to hierarchically generate QA data from the original PG19 dataset
