# ContextBench

## 🔒 Double-blind review instructions 🔒
1. **Authenticate once (read-only token)**

   ```bash
   export HF_TOKEN=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
   huggingface-cli login --token $HF_TOKEN --add-to-git-credential

## Codebase

* The core functionality is implemented within `src/eliciting_contexts` subdirectories.
* Specifically:
    * `src/eliciting_contexts/contextbench` contains code that will be made open source to run the benchmark (method agnostic)
    * `src/eliciting_contexts/benchmark` contains code used by our paper for the EPO and baseline methods on the benchmark.
* We use modified copies of the following repos, that live in `external`:
    * `external/custom_dreamy` - EPO
    * `external/llada` - Llada inpainting
    * `external/sandbagging_research_sprint_master` - Sandbagging setup

## Setupa

* The project uses UV as the package manager, so install this if not already installed:
```
curl -LsSf https://astral.sh/uv/install.sh | sh
```
<details>

<summary>Installing dependencies using the make file (Recommended)</summary>

* Use the provided make file to install dependencies.

```
make env-setup
```
</details>

<details>

<summary>Manually installing dependencies</summary>

* Install the dependencies from the lock file

```
uv sync
```
* Ensure git-hooks are running on every commit
```
pre-commit install
```
</details>


### Model access
* Log in to huggingface to access gated models
```
# CLI
huggingface-cli login
# In-line python code
huggingface_hub.login()
```

## Debugging

```
# To change python version
uv venv -p 3.11

# Install python
sudo apt install python3.11

# Add current path to PYTHONPATH
export PYTHONPATH="$(pwd):$PYTHONPATH"

# Create key
ssh-keygen -t rsa -b 4096 -f ~/.ssh/vast_key

# Create the venv
uv sync

# Install flash-attn separately
uv pip install flash-attn --no-build-isolation

# Add ssh key to linux
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
```
