# Future Probing
Code release for the Sections 1-3 of the paper [Future Events as Backdoor Triggers](https://arxiv.org/abs/2407.04108).

### Prerequisites
* Python >= 3.10
* Virtual environment tool
* Access to at least one GPU (RTX-series should be sufficient. A100/H100 not necessary)

### Setup
```
conda create -n <env-name> python3.10
conda activate <env-name>
pip install -r requirements.txt
```
Create a SECRETS file in the main repos
```
HUGGINGFACE_TOKEN=<TOKEN> #Make sure to use a token that has the appropriate access to be using LLama models.
OPENAI_API_KEY=<KEY>
REPLICATE_API_KEY=<KEY> #We use replicate as the API service to run inference on some of our models. Feel free to swap in another service just note you may have to make some adjustments to the fuutre_probing scripts
NYT_API_KEY=<KEY> #Used if pulling new data from NYT 
```

## Linear Probes

### Usage
We include instructions for training your own versions of the probes we train on activations from Llama 2 7B and 2 13B in the paper. In our experiments, we also trained probes using various parameter sizes of Qwen, Pythia, and Mistral. Model + parameter size combinations currently supported are listed below. This is also where the layers for which we pull activations are set. If you'd like to change the layers for which you pull activations, that can be easily done here as well. Any models that are currently in TransformerLens and fit in memory on the GPU(s) to which you have access should work.

#### Currently Supported Model Aliases:
* ```gpt2_medium```
* ```llama2_7b```
* ```llama2_13b```
* ```qwen_1.8b```
* ```qwen_7b```
* ```qwen_14b```
* ```pythia_160m```
* ```pythia_1.4b```
* ```pythia_6.9b```
* ```mistral_7b```

#### Train New Probes
The code is also currently designed to train probes on two different sets of years of headlines. The data necessary for replicating results in the paper is already pre-loaded in the requisite locations. Using your own datasets will require some manipulation of the file path setting logic as well as passing paths to your data files. 
To train a probe on llama 2 7B activations from headlines from years 2017-2019 v. 2023-2024 using weight decay 0.01 run:
```
python train_probes.py \
--model llama2_7b \
--weight_decay 0.001 \
--seed 42 \
--past_years 2017_2019 \
--future_years 2023_2024
```
By default all three types of probes referenced in the paper (single topic, mixed topic, and hold one out) are trained on Foreign, Washington, Politics, and Business headlines. Further, copies of these probes get saved to your local file system for inference later as do predictions. To adjust these settings, the following args can be passed with the above code:
```
--single_topic_probe: bool \
--mixed_topic_probe: bool \
--hold_one_out_probe: bool \
--save_probe: bool \
--get_predictions : bool
```

#### Run Inference on Trained Probes
We run inference on the altered "challenge" sets of headlines (paraphrased, untrue, etc). To do this, we don't need to train any new probes but just use existing ones saved in your ```probe_dir```. ```exp_type``` corresponds to a directory where data is stored under the datasets. Set the weight_decay, seed past_years, and future_years to match the trained probe on which you want to run inference.
```
python run_probe_inference.py \
--model_name llama2_7b,
--exp_type paraphrased,
--weight_decay 0.01,
--seed 42,
--past_years 2017_2019 \
--future_years 2023_2024
```
