# Datasets
A majority of the data we use in our experiments use New York times headlines

## Directory Structure
```
nyt
└── data
    ├── 2016
    │   ├── Business_headlines.json
    │   ├── Foreign_headlines.json
    │   ├── Politics_headlines.json
    │   ├── Washington_headlines.json
    ├── 2017
    ├── 2018
    ├── 2019
    ├── 2020
    ├── 2021
    ├── 2022
    ├── 2023
    ├── 2024
    ├── 20230801_20240218
    ├── headline_experiments
    ├── rated_headlines
```
* All sub-directories under ```datasets/nyt/data``` from 2016-2024 contain headlines from a single year organized by news desk and look like the above example for 2016.
* **20230801_20240218**: Contains headlines from the post-training cutoff "future" headline dataset we used in training sleeper agents and probes.
* **headline_experiments**: Contains files relating to experiments using altered versions of headlines, fake headlines, etc
* **rated_headlines**: Contains a subset of the headlines that have been assigned a year when GPT-4 along with reasoning for why GPT-4 thinks it occurred then. These are results from Section 3.2 of our paper
