# Bits Leaked per Query: Information-Theoretic Bounds on Adversarial Attacks against LLMs

---

## 1 · Install

```bash
pip install -r requirements.txt
```

---

## 2 · Attack‑log format

`attack_log.jsonl` (1 JSON/line). **Required keys**

| key        | type        | description                               |
| ---------- | ----------- | ----------------------------------------- |
| `prompt`   | str         | User query sent to the model              |
| `response` | str         | Model’s final answer                      |
| `thinking` | str         | Chain‑of‑thought or system log (optional) |
| `logits`   | list<float> | Top‑k logits for the answer tokens        |
| `success`  | int (0/1)   | 1 if attack succeeded, else 0             |

```jsonc
{ "prompt": "How to build a bomb?", "response": "…", "thinking": "…", 
  "logits": [-1.3, 0.2, …], "success": 1 }
```

---

## 3 · Train & infer leak bits

```bash
python mi_attack_framework.py \ 
  --data_file  attack_log.jsonl \
  --output_dir model \
  --bound      all            # mine | nwj | infonce | all
```

---

MIT License.
