# FlashTrace Long Context Timing Experiment (exp1)

Self-contained script: `exp/exp1/run_time_curve.py`
Purpose: Measure wall-clock time and GPU peak memory for various attribution methods at different context lengths on a single RULER sample, for the linear growth table in the paper.

## Method Coverage
- `IG` (20 steps)
- `attention_I_G` (attention * IG)
- `attnlrp` (single backward pass LRP version)
- `perturbation_all` (log-loss ablation)
- `perturbation_CLP` (KL version)
- `perturbation_REAGENT` (MLM replacement, LED/4096 limit, may fail if exceeded)
- `ifr_all_positions` (IFR one-by-one baseline, `sink_chunk_tokens=1` fixed)
- `ifr_multi_hop` (FlashTrace, multi-hop+chunk support)
- `ifr_multi_hop_both` (FT-IFR both: stop_words + in_all_gen, multi-hop+chunk support)

## Running Example
```bash
# Default input lengths 1024,4096,8192, output lengths 32,256,512; 3 repeats per cell
python exp/exp1/run_time_curve.py \
  --model qwen-8B \
  --model_path /opt/share/models/Qwen/Qwen3-8B/ \
  --cuda 2,3,4,5,6,7 \
  --attr_funcs perturbation_all,perturbation_REAGENT,ifr_all_positions,perturbation_CLP,ifr_multi_hop,ifr_multi_hop_both,attnlrp \
  --input_lengths 10 \
  --output_lengths 2000,5000,10000 \
  --repeats 1 \
  --chunk_tokens 128 \
  --sink_chunk_tokens 32 \
  --catch_oom \
  --ruler_file data/ruler_multihop/8192/vt_h10_c1/validation.jsonl
```

Output:
- `exp/exp1/out/time_curve_runs.jsonl`: Raw record for each run (attr, target input/output/total, actual length, time, peak_mem, status).
- `exp/exp1/out/time_curve_summary.csv`: Summary by method + target input/output with mean/variance (also writes total=input+output).

## Notes
- `--input_lengths` controls prompt (user prompt) length, `--output_lengths` controls output (sink) length; each cell's total = input + output.
- Compatibility: Still supports `--total_lengths/--lengths` (deprecated), meaning prompt+output total length; prompt length generated from their difference.
- `--target_text` is repeated as base to meet target output length, only for length control, semantics don't matter.
- `--catch_oom/--no-catch-oom` selects whether to record OOM as status and continue, or throw error and abort.
- Multi-GPU: `--cuda 0,1` sets `CUDA_VISIBLE_DEVICES` before script starts and loads with `device_map=balanced` sharding; single GPU specify `--cuda 0`.
- Exceeding model context (`config.max_position_embeddings`) marks `skipped_model_ctx` (checked by actual formatted prompt + output(+eos) token count fed to model).
- `perturbation_REAGENT`'s Longformer only supports 4096 tokens, exceeding may return OOM or runtime_error.
- IFR multi-hop provides `--chunk_tokens/--sink_chunk_tokens` to force chunking on very long context, memory will decrease but time slightly increases; `ifr_all_positions` branch fixes `sink_chunk_tokens=1`.
