# Box Dataset

These directiories contain all the data splits used in the paper [Entity Tracking in Language Models](https://arxiv.org/abs/2305.02363).

Splits:

 * T5 datasets (used for fine-tuning and evaluating T5):
   * `t5_boxes_nso_exp2_max3`: _Base_ and _Vocab_ splits.
   * `t5_boxes_nso_numops_trainlen=2_exp2_max3`: _NumOps_ split.
   * `t5_boxes_nso_exp2_max3_alt_forms_train`: _AltForms_ and _AltForms+NumOps_ splits.
   * `t5_boxes_nso_exp2_max3_move_contents`: _MoveContents_ split.
   * `t5_boxes_nso_exp2_max3_ambiref`: _AmbiRef_ split.

 * Zero-shot/few-shot datasets (used for in-context experiments)
   * `few_shot_boxes_nso_exp2_max3`: _Base_ split. 
   * `few_shot_boxes_nso_exp2_max3_move_contents`: _MoveContents_ split.
   * `few_shot_boxes_nso_exp2_max3_ambiref` _AmbiRef_ split.

See the README.md file in each directory for a more detailed description of each split.