# FlashTrace Experiment 5: Cross-model (Qwen → Llama) Token-span Mapping

## Background: Why Mapping is Needed

`exp/exp2/run_exp.py` attribution and evaluation is strictly **token-level** and depends on token-span fields in cached data:

- `indices_to_explain = [start_tok, end_tok]` (generation token indices, closed interval)
- `sink_span` / `thinking_span` (also generation token spans)

These spans are computed and fixed using a specific tokenizer when generating cache (`exp/exp2/sample_and_filter.py`, `exp/exp2/map_math_mine_to_exp2_cache.py`) (usually `Qwen3-8B`'s tokenizer).

When you switch to a new model (e.g., `Llama-3.1-8B-Instruct`), **tokenizer differs**, `target`'s tokenization length/boundaries will change, causing old spans to often go out of bounds under new tokenizer, making exp2 error directly during attribution (`IndexError: end_tok out of range`).

## Solution: exp5 Mapping Script

`exp/exp5/map_exp2_cache_token_spans.py` maps old token-spans in exp2 cache from old tokenizer (default `Qwen3-8B`) to new tokenizer (default `Llama-3.1-8B-Instruct`), and outputs to:

`exp/exp5/data/<same dataset name>.jsonl`

Mapping strategy (default):
1) Use old tokenizer on `target` with `return_offsets_mapping=True`
2) Convert old token-span to character interval in `target`
3) Use new tokenizer on same `target` with offsets, map character interval back to new token-span

For extreme cases (cache not produced by expected old tokenizer), enable `--allow_fallback_answer` to use `metadata.boxed_answer` (or `reference_answer`) to relocate span under new tokenizer as fallback.

---

## Step 1: Map exp2 Dataset Cache to exp5/data

Recommend using the repo's venv:

```bash
.venv/bin/python exp/exp5/map_exp2_cache_token_spans.py \
  --in_jsonl exp/exp2/data/niah_mq_q2.jsonl \
  --out_dir exp/exp5/data \
  --old_tokenizer_model /opt/share/models/Qwen/Qwen3-8B \
  --new_tokenizer_model /opt/share/models/meta-llama/Llama-3.1-8B-Instruct
```

Map multiple datasets at once (example: RULER + math):

```bash
.venv/bin/python exp/exp5/map_exp2_cache_token_spans.py \
  --in_jsonl exp/exp2/data/niah_mq_q2.jsonl exp/exp2/data/math.jsonl \
  --out_dir exp/exp5/data \
  --old_tokenizer_model /opt/share/models/Qwen/Qwen3-8B \
  --new_tokenizer_model /opt/share/models/meta-llama/Llama-3.1-8B-Instruct
```

Add `--overwrite` if output file already exists.

Default behavior: If a sample cannot be mapped, script will **drop** it and report in output statistics; add `--strict` for strict consistency (exits on first failed sample). If suspecting original cache was not produced by `--old_tokenizer_model`, add `--allow_fallback_answer` to enable fallback positioning based on `metadata.boxed_answer`.

---

## Step 2: Run Llama Attribution Evaluation Using exp2 (but data/output point to exp5)

Key points:
- **Data reading**: Use `--data_root exp/exp5/data` (makes exp2 read mapped cache)
- **Result output**: Use `--output_root exp/exp5/output` (avoid writing to `exp/exp2/output`)
- **Don't add** `--save_hop_traces` (avoid writing traces)

### RULER (can run recovery + faithfulness)

```bash
CUDA_VISIBLE_DEVICES=0 .venv/bin/python exp/exp2/run_exp.py \
  --datasets niah_mq_q2 \
  --data_root exp/exp5/data \
  --output_root exp/exp5/output \
  --attr_funcs ifr_all_positions,attnlrp,ifr_multi_hop_both \
  --model_path /opt/share/models/meta-llama/Llama-3.1-8B-Instruct \
  --cuda 0 \
  --num_examples 100 \
  --mode faithfulness_gen,recovery_ruler
```

### math (can only run faithfulness; recovery will be explicitly rejected by exp2)

```bash
CUDA_VISIBLE_DEVICES=0 .venv/bin/python exp/exp2/run_exp.py \
  --datasets math \
  --data_root exp/exp5/data \
  --output_root exp/exp5/output \
  --attr_funcs ifr_all_positions,attnlrp,ifr_multi_hop_both \
  --model_path /opt/share/models/meta-llama/Llama-3.1-8B-Instruct \
  --cuda 0 \
  --num_examples 100 \
  --mode faithfulness_gen
```

## About "Whether This Will Pollute exp2 Folder"

- **Will not pollute `exp/exp2/data/`**: We don't modify exp2's cache, but output to `exp/exp5/data/`.
- **No traces written without `--save_hop_traces`**.
- But note: `exp/exp2/run_exp.py` itself **will always write CSV metric files** to `--output_root` (code behavior, exp5 doesn't modify exp2), so to achieve "no new files in exp2 folder", please point `--output_root` to `exp/exp5/output` (or other directory).

```bash
python exp/exp2/run_exp.py \
  --datasets niah_mq_q2 \
  --data_root exp/exp5/data \
  --output_root exp/exp5/output \
  --attr_funcs ifr_all_positions,attnlrp,ifr_multi_hop_both \
  --model_path /opt/share/models/meta-llama/Llama-3.1-8B-Instruct \
  --cuda 2,3,4,5,6,7 \
  --num_examples 100 \
  --mode faithfulness_gen \
  --n_hops 1
&& python exp/exp2/run_exp.py \
  --datasets math \
  --data_root exp/exp5/data \
  --output_root exp/exp5/output \
  --attr_funcs ifr_all_positions,attnlrp,ifr_multi_hop_both \
  --model_path /opt/share/models/meta-llama/Llama-3.1-8B-Instruct \
  --cuda 2,3,4,5,6,7 \
  --num_examples 100 \
  --mode faithfulness_gen \
  --n_hops 1
```
