# context-attribution

Sample of how to use the repo:

1. `pip install -e .`
Installs the package in editable mode.

2. `python scripts/preprocess_dataset.py --dataset hotpot_qa --model meta-llama/Meta-Llama-3.1-70B-Instruct --dtype float16 --prompt-template 'Answer the question based on the provided context:\n\nContext:\n{context}\n\nQuestion: {question}' --num-examples 1000 --output-path data/datasets/hotpot_qa_l70`
The script (1) loads up the specified dataset, (2) preprocesses the context to form a `context_tree` that encodes the hierarchical structure of the context (e.g., different documents, paragraphs within documents, sentences within paragraphs, etc.), (3) formats the example according to the `--prompt-template` argument and generates a response using the specified model. It then writes out a preprocessed huggingface dataset to the specified output path. Dataset has the following form:
```
Dataset({
    features: ['id', 'question', 'answer', 'context_tree', 'response_ids', 'response', 'prompt_template'],
        num_rows: 1000
        })
```

3. `python scripts/compute_attributions.py --dataset data/datasets/hotpot_qa_l70 --output-dir data/attributions/l70_kv --dtype float16 --use-cache loo --model-name meta-llama/Llama-3.1-70B`
The script runs attribution with the specified model and attribution method (e.g., LOO, hierarchical, pruning) and writes the results out to the specified output directory.
