# On the Impact of the Utility in Semivalue-based Data Valuation

## Installation
1. Create a virtual environment:
```bash
python -m venv .venv
source .venv/bin/activate
```

2. Install the project in editable mode with dependencies
```bash
pip install -e . -r requirements.txt
```

## Configuration
Edit src/config/config.py to set:
- DEFAULT_RANDOM_STATE: seed to ensure reproducibility.
- FIXED_PARAMS: model, optimizer, criterion, etc.
- CHANGING_PARAMS: list of utility functions to compare.
- NB_PERMUTATION_SAMPLING: number of permutations used for semivalue estimation.
- TRAIN_SIZE: size (or fraction) of training set to use.
- TEST_SIZE: size (or fraction) of test set to use.
- P_VALUES: list of values of $p$ used for computing the robustness metric $R_p$.

## Running the pipeline
1. Compute semivalue scores and save results
```bash
python src/scripts/compute_scores_{...}.py
```
This creates:
```bash
results/all_values.pkl
results/all_marg_contrib.pkl
```

2. Compute rank correlations
```bash
python src/scripts/compute_correlation.py
```

3. Compute robustness score R_p
```bash
python src/scripts/compute_robustness.py
```

4. Plot spatial signatures
```bash
python src/scripts/plot_spatial_signatures.py
```

5. Plot $r_j$ with respect to $j$ overlaid by $\omega_j$
```bash
python src/scripts/plot_rj_vs_omegaj.py
```