# Training on the Test Task

Code to reproduce the experiments, figures and tables of the paper Training on the Test Task Confounds Evaluation and Emergence.

* The folder [experiments/](experiments/) contains the code to fine-tune models on the datasets of task-relevant data considered, and to evaluate models using the LM Evaluation Harness library.
* The folder [notebooks/evaluations](notebooks/evaluations) contains the model evaluation files.
* The Jupyter notebook [notebooks/figures.ipynb](notebooks/figures.ipynb) reproduces the figures and tables in the paper.