# Paper Experiments

## Most important
Scripts log in to HiggingFace to download model weights. To do this, a HuggingFace token is loaded, to be specified in a file named ".env".

To run the experiments computing rhyme metrics do:
```
pip install uv
uv venv
uv pip install -r requirements.txt
source .venv/bin/activate
python paper_experiments/rhyme_steering_stages/run_all_experiments.py
```

Running the pipeline to compute all metrics for all models in one go leads to disk space and/or GPU memory errors. Scripts will resume from the last saved information after failure.

It is cleaner to run the with one model at a time. Model names to use are defined in shared_utils.py and can be added as command line arguments, for example:

```
python -m paper_experiments.rhyme_steering_stages.run_all_experiments --model Gemma3_1B
```

For specific word steering you could theoretically do
```
source .venv/bin/activate
python paper_experiments/data_preparation/run_filter_suggestiveness_all_models.py
python paper_experiments/rhyme_steering_stages/run_all_experiments.py --mode specific_word_steering
```
But for the words I tried, no or almost no lines are suggestive enough ...

## File Overview
- data/test/rhyme_family_lines.json: The lines used in rhyme family steering for generation_prompts
- data/test/specific_word_lines.json: The lines used in specific word steering for generation_prompts
- data/test/specific_word_pairs.json: Lines I created that could lead up to two different words but I don't really like them
- data/train/rhyme_family_lines.json: The lines used in rhyme family steering for estimating the steering vector
- data/train/specific_word_lines.json: The lines used in specific word steering for estimating the steering vector
- rhyme_steering_stages/stage_line_generation.py: Generate the lines for steering experiments
- rhyme_steering_stages/stage_standard_metrics.py: Calculate metrics which do not require probabilities (correct fraction, correct regeneration fraction)
- rhyme_steering_stages/stage_prob_based_metrics.py: Calculate metrics which require probabilities (kl divergence, top 1 difference)
- rhyme_steering_stages/combination.py: Create the combined_results.json
- rhyme_steering_stages/run_all_experiments.py: Run all the the scripts above in a pipeline.

Additional custom scripts in rhyme_steering_stages/:
- stage_unsteered_regeneration_combined.py saves detailed data from unsteered regeneration for further analysis, feeding into
- visualize_regeneration_across_models.py which creates the regeneration visuals, including comparison with baselines
- stage_prob_based_metrics_raw.py like stage_prob_based_metrics but saves raw values of KL/top1 predictions not saved by the main script, for inspection
- sample_prob_pairs.py samples from the outputs of the previous script an example for visualizing the idea of KL vs top1 metrics