The dataset required to run this code can be found with the anonymized link: https://figshare.com/s/1df69795de7c75ad3b09 .
Data should be unzipped and placed in the data folder. Additionally, llama2-7b-hf should be downloaded and placed in the provided folder.

To run, you will need to build the embeddings for a layer out of llama2 using the build_layers.py script. To replicate the paper, layer 3 was used with a context length of 20.

Each model run must then be made using the ridge_regression.py, low_rank_crossval_sweep_single.py, low_rank_crossval_sweep.py and control_subtracted_runs.py. 

Finally, the plots for evaluation as in the paper are generated using the ipynb files.

Note a GPU is required for both the plotting and model fitting