# FlashTrace Experiment 3: Long/Short CoT Comparison (Case Study)

This directory provides a minimal reproducible experiment for "Long/Short CoT":
- From RULER `niah_mq_q2 (1024)`, filter out:
  - short-CoT: Short reasoning + `\box{}` final answer
  - long-CoT: Long reasoning + `\box{}` final answer
- Run only `attnlrp` (hop0) and only compute token-level `recovery@10%` (gold from `needle_spans`).
- Save traces (npz + manifest) to `exp/exp3/output/`, format aligned with `exp/exp2/run_exp.py` trace conventions.

## 1) Sampling and Filtering (Generation + Judge)

Default reads:
`data/ruler_multihop/1024/niah_mq_q2/validation.jsonl`

Requires an OpenAI-compatible chat API (default `http://localhost:4000/v1`) and API key.

```bash
export FLASHTRACE_API_KEY=...  # or OPENAI_API_KEY

python exp/exp3/sample_and_filter.py \
  --tokenizer_model /opt/share/models/Qwen/Qwen3-8B/ \
  --min_long_thinking_tokens 512 \
  --max_short_thinking_tokens 256
```

Output (default):
- `exp/exp3/data/niah_mq_q2_short_cot.jsonl`
- `exp/exp3/data/niah_mq_q2_long_cot.jsonl`

Notes:
- Default samples 1 each; use `--max_short` / `--max_long` to specify counts separately (`--max_pairs` is a compatibility alias for both).

## 2) Attribution and Recovery (AttnLRP hop0)

```bash
python exp/exp3/run_exp.py \
  --model qwen-8B \
  --model_path /opt/share/models/Qwen/Qwen3-8B/ \
  --cuda 3,4,5,7
```

Output:
- Recovery CSV: `exp/exp3/output/recovery/<dataset>/<model>/attnlrp_1_examples.csv`
- Traces: `exp/exp3/output/traces/<dataset>/<model>/<run_tag>/ex_*.npz` + `manifest.jsonl`
- Summary JSON: `exp/exp3/output/recovery/summary_<model>.json`

Common parameters:
- `--top_fraction`: Recovery top fraction (default 0.1)
- `--attnlrp_neg_handling drop|abs`
- `--attnlrp_norm_mode norm|no_norm`
