## Plague:

### 1) Install dependencies

Option A (uv):

```bash
pip install -U uv
uv sync
```

Option B (pip + venv):

```bash
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
pip install .
```

Requirements:
- Python >= 3.13
- NVIDIA GPU with compatible CUDA drivers for `vllm`

### 2) Start required vLLM servers (in separate terminals)

```bash
CUDA_VISIBLE_DEVICES=3 vllm serve meta-llama/Llama-3.1-8B-Instruct \
  --gpu-memory-utilization 0.4 \
  --max-model-len 4096 \
  --download-dir ~/.cache/huggingface/hub/ \
  --host 127.0.0.1 \
  --port 7000
```

```bash
CUDA_VISIBLE_DEVICES=3 vllm serve Qwen/Qwen3-Embedding-0.6B \
  --gpu-memory-utilization 0.3 \
  --task embed \
  --max-model-len 8192 \
  --download-dir ~/.cache/huggingface/hub/ \
  --host 127.0.0.1 \
  --port 7005
```

Notes:
- Adjust `CUDA_VISIBLE_DEVICES` to pick your GPU.
- Keep both servers running while executing the attack.

### 3) Run the attack

```bash
python3 plague.py
```

Results will be written under `results/plague/`.

### 4) Key files and customization

- Prompts and templates:
  - `src/prompt_utils.py`: core templates like `PRIMER_SYSTEM_PROMPT`, `PLANNER_PROMPT`, `FINISHER_SYSTEM_PROMPT`, `EVAL_PROMPT_RELAXED`, `SUMMARIZER_PROMPT`. Edit these to change prompt wording/logic.
  - `src/candidate_generator/crescendo_attack.py`: `SYSTEM_PROMPT` and JSON format for the Crescendo candidate generator.
  - `src/candidate_generator/goat_attack.py`: `SYSTEM_PROMPT`, `ATTACKER_INITIAL_PROMPT`, and follow-up prompt logic for the GOAT finisher.

- Model initialization and calls:
  - `src/infer.py`: defines `query_openai`, `query_anthropic`, `query_together`, `query_gemini`, `query_vllm_server`, and `query_embeddings`. vLLM servers are accessed via OpenAI-compatible HTTP endpoints.
  - `src/blackbox_model.py`: `MODELS` map selects provider and function per model string; `BlackBoxModel` wraps querying (`query`, `query_parallel`) and embeddings (`embed`). Add or swap models/providers here.
  - Ports: summarization queries use port `7000`; embeddings use port `7005`. Change in `plague.py` (`embedder_port`) and where `summariser_model.query(..., port=7000)` is called.

- Orchestration and where to modify components:
  - `plague.py`: main pipeline. Instantiates `PlagueConfig`, wires `BlackBoxModel`s (target/attacker/evaluator/summariser/embedding), sets `self.candidate_generator` (Crescendo) and `self.goat_finisher` (GOAT). Entry point loads `./harmbench_dataset` and runs `optimize(...)`. Change models, rounds, steps, and toggles by editing the config in the `__main__` block.
  - `src/config.py`: `PlagueConfig` fields (e.g., `primer_steps`, `use_strategy_library`, `use_planner`, `use_actor_plan`, `max_rounds`, `project`).
  - `src/candidate_generator/base.py` and `src/strategy_generator/base.py`: base classes if you want to implement new generators/strategies.

- Evaluation and scoring:
  - `src/rubric_based_scorer.py`: rubric-based LLM scoring (extracts `<score>` from evaluator output).
  - `src/evaluator.py`: higher-level evaluation utilities (`evaluate_responses`, `jailbreak_bench_eval`).

- Data and outputs:
  - Input dataset is expected at `./harmbench_dataset` (loaded via `datasets.load_from_disk`).
  - Outputs under `results/plague/` with per-goal folders; config for a run is saved to `results/plague/plague_0/config.json`.

- Environment variables (if using API providers):
  - Set in `.env` or environment: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `TOGETHER_API_KEY`, `GEMINI_API_KEY` (loaded in `src/infer.py`).