# exp/proc_1 (exp2 Trace Mapping/Export v1)

This directory provides tools for processing trace results from `exp/exp2/run_exp.py --save_hop_traces` into simplified sample-level `.npz` files for external use (v1).

Differences from `exp/proc/`:
- Removed `tok` (per-token text fragments).
- Added `length` (three-segment token lengths): `[in, cot, out]`, guaranteed to align with `span_in/span_cot/span_out`.
- `hop` field uses "default strategy": only outputs `hop` when `vh` exists in the trace sample; otherwise omits without error.
- Supports batch processing of all run directories under `exp/exp2/output/traces/` (all dataset-method combinations).

---

## Input Structure (exp2 traces)

exp2 trace run directories look like:
- `exp/exp2/output/traces/<dataset>/<model>/<run_tag>/`

Each run directory contains:
- `manifest.jsonl` (one sample record per line, including `file=ex_*.npz`)
- `ex_*.npz` (one npz per sample)

---

## Output Location and Naming

Default output to:
- `exp/proc_1/output/<isomorphic path after traces/>/`

For example, input:
- `.../output/traces/exp/exp2/data/math.jsonl/qwen-8B/<run_tag>/`

Default output:
- `exp/proc_1/output/exp/exp2/data/math.jsonl/qwen-8B/<run_tag>/`

---

## Output `.npz` Fields

Each output sample `.npz` contains only the following keys:
- `attr`: `float32[L]`, row attribution vector; covers valid tokens of `input+cot+output` (removes generation trailing EOS).
- `hop`: `float32[H, L]` (optional), output when `vh` exists in trace npz (also removes EOS, aligned with `attr` length).
- `span_in`: `int64[2]`, input's closed interval range in the vector.
- `span_cot`: `int64[2]`, cot's closed interval range in the vector (`[-1, -1]` if no cot).
- `span_out`: `int64[2]`, output's closed interval range in the vector.
- `length`: `int64[3]`, in order `[in, cot, out]`, strictly corresponds to `span_*` (closed interval length `end-start+1`, empty span has length 0).
- `rise`: `float64`, row's RISE (faithfulness).
- `mas`: `float64`, row's MAS (faithfulness).
- `recovery`: `float64`, row's Recovery@10% (NaN if no recovery).

---

## Usage Examples

Process all runs under traces (recommended):
```bash
python exp/proc_1/map_exp2_traces_to_proc_1.py \
  --traces_root exp/exp2/output/traces
```

Process only a single run directory:
```bash
python exp/proc_1/map_exp2_traces_to_proc_1.py \
  --trace_dir exp/exp2/output/traces/exp/exp2/data/math.jsonl/qwen-8B/ifr_multi_hop_both_n1_mfaithfulness_gen_100ex
```

Debug: process only first 5 per run, allow output overwrite:
```bash
python exp/proc_1/map_exp2_traces_to_proc_1.py \
  --traces_root exp/exp2/output/traces \
  --limit 5 \
  --overwrite
```
