# Beyond Memorization: Violating Privacy via Inference with Large Language Models

This is the repository for the for the ICLR 2024 Paper Submission 9451 "Beyond Memorization: Violating Privacy via Inference with Large Language Models".

As our PersonalReddit dataset contains personal information of individuals we do not make it available within this repo. Alongside the supplemental material we instead provide a list of 525 synthetic examples.


## Setup

We note that several python scripts in this repository require you to set you current PYTHONPATH accordingly. We left appropriate marks in the source code via `/your/workspace/path`.

For all experiments we provide a config file in the folder `configs`. This includes evaluations. We refer to the specific section below.

Note: In order to run the experiments you have set the credentials in `credentials.py` to use the OpenAI-API. For non-api models you must set up your environment according to the API (E.g. Llama-2 requires permission from Meta). 

Lastly you can install the used environment from our conda `environment.yaml`. Simply use the command `conda env create -f environment.yml` (on linux). We give install instructions for `mamba` below. 

## Running

To run any experiment proceed as follows:

```bash
conda activate gen_leak
python ./main.py --config_path <your_config>
```

we provide our configs in the `configs` folder.

For PersonalReddit one can evaluate a file with ground truth labels via the evaluation configs provided in `configs`.

To generate plots we provide several plotting utility scripts in `src/visualization` along with commonly used scripts in `scripts`.

## Mamba Environment

Install mamba via:

```
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh
```

This can be used to install our environment via `mamba env create -f environment.yaml`
