# Content

Note: This code base is adapted from (Vu & Nguyen (2025)) which is cited below, such that only the files used during the experiments are kept. 
The rest of them are not included in this code base. This README.md is thus also adapted.

The main source containing:
  - `angular_steering_xxx.ipynb`: analysis and visualization of activations and
    directions, extraction of the steering directions, construction of the steering
    plane. The different suffix (xxx) yields different configurations of angular steering (with Momentum/Adam/Neither). 
    Causal indicates the velocities/moments are computed sequentially, and the ActAdd1p0 attached to it indicates the sequential steering function is ActAdd. It is Directional Ablation otherwise. 
    The Naive Variants do not perform sequential steering.
    The Beta String indicates that coefficients of the momentum/moments.
    The suffix of momentum_baseline is just regular angular steering.
  - `config.py `: Contains a mapping function such that given the configuration, maps the model to the correct file containing the steering vectors.
  - `generate_responsees.py`: generation of steered responses (with our vLLM fork) used for evaluation.
  - `evaluate_jailbreak.py`: evaluation of `substring_matching`, `LlamaGuard 3`,
    `HarmBench`.
  - `eval.sh`: script to run the evaluation of tinyBenchmarks. (with `github.com/EleutherAI/lm-evaluation-harness`)
  - `visualization_all.ipynb`: Computation (+ Visualization) of benchmark results used in the paper.
  - `endpoint.py`: an OpenAI-compatible endpoint server to play with Angular Steering.
  - `norm_length_observation/`: Contains the notebook used to run the experiment for generating the steering vectors of a randomly initialized model.


# Installation

1. Create conda environment:

```bash
conda create -n angular_steering python=3.10
conda activate angular_steering
pip install -r requirements.txt
conda deactivate
cd ..
```

2. Install vLLM from source (following the instructions in the vLLM repo https://docs.vllm.ai/en/stable/getting_started/installation/gpu.html):

```bash
cd ../vllm/
conda activate angular_steering
VLLM_USE_PRECOMPILED=1 pip install --editable .
conda deactivate
cd ..
```

3. Create a separated conda environment for lm-evaluation-harness and install it
   following https://github.com/EleutherAI/lm-evaluation-harness:

```bash
cd ..
conda create -n lm_eval python=3.10
conda activate lm_eval
git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e ."[api]"
pip install git+https://github.com/felipemaiapolo/tinyBenchmarks
conda deactivate
cd ..
```

# Reproducing the results

## Using precomputed results

We provide the precomputed outputs and results in the `output/` and `benchmarks/`
folders. You can directly use the `visualization.ipynb` notebook to visualize the
results.

## Generating the results

1. Run the `angular_steering_xxx.ipynb` notebook for any choice of configuration to extract the steering directions and
   construct the steering plane. This will generate the `output/` folder with the
   extracted steering directions and the steering plane.
2. If new configurations are run, modify the config.py file to include the mapping to file containing the steering vectors.
3. Run the `generate_responsees.py` script to generate the steered responses. Modify the configuration parameters 
   to indicate the desired steering vector associated with the chosen parameters.
4. The `evaluate_jailbreak.py` script will evaluate the steered responses with
   `substring_matching`, `LlamaGuard 3`, `HarmBench`. Some of the
   evaluation requires serving LLMs beforehand.
   - Serve `meta-llama/Llama-Guard-3-8B` using vLLM on port 8898 for `LlamaGuard 3` evaluation.
   - For `substring_matching`, it does not require serving any LLMs.
   - You should edit the `methods` list in the main function of the script to
     include/exclude the methods you want to evaluate.
   - The output will be saved in the `output/` folder.
5. Run the `eval.sh` script in the `lm-eval` environment to evaluate the steered
   generation on tinyBenchmarks.
   - You must first serve run `endpoint.py` script to serve an OpenAI-compatible endpoint
     server with Angular Steering supported. Change the model name in the script to the
     one you want to evaluate.
   - Change the `MODELS` and `port` variables in the `eval.sh` script to the models you want to
     evaluate and the port of the endpoint server.
   - The output will be saved in the `benchmarks/` folder.
   - This might take a while.
6. Run the `visualization.ipynb` notebook to visualize the benchmark results.

---

Thank you for your interest in our work!


## References

```bibtex
@article{vu2025angular,
    title={Angular Steering: Behavior Control via Rotation in Activation Space},
    author={Hieu M. Vu and Tan Minh Nguyen},
    journal={Advances in Neural Information Processing Systems},
    year={2025},
}