# ICLR Submission: CAB - Framework

## Installation

We recommend using [mamba](https://mamba.readthedocs.io/en/latest/) (or conda) to create the conda environment. You can create the environment using the provided `environment.yaml` file:

```bash
mamba env create -f environment.yaml
conda activate bias-env
```


## Usage

All our code can be run using the `main.py` script provided with the corresponding configuration files in the `configs` directory. For example we can run our algorithm for the attribute gender on using Nous Hermes 70B with the following command:

```bash
(bias-env) python main.py --config configs/run/gender/run_gender_nous70b.yaml
```

Please note that depending on the model provider you might need to set up authentication tokens as environment variables. For example for all models used in our work you can add the following keys to your bashrc or zshrc:

```bash
export OPENAI_API_KEY="your_api_key_here"
export TOGETHER_API_KEY=""
export ANTHROPIC_API_KEY=""
export INVARIANT_API_KEY=""
export OPENROUTER_API_KEY=""
```

- We note that we provide configs for all our major runs in the `configs/run` directory.
- Additionally you find configs for running evaluations in the `configs/model_eval` directory. These currently point towards the evaluation of the implicit set of cab (`cab_implicit/gender/transformed_questions.jsonl`) but can be easily adapted to evaluate the explicit set of cab or any other dataset. When running evaluations we commonly split it into multiple runs with several models per each run. The code automatically stores the results accordingly and allows for joint loading and plotting.
- All name transformation configs for the implicit CAB can be found in the `implicit/transformations` directory.

Human scoriing was done via gradio using the `human_scoring.py` script. You can launch the human scoring interface using the following command:

```bash
(bias-env) python human_scoring.py
```


## Plotting

# We provide two major utilities to visualize the results of running our algorithm or the evaluation of CAB

## Visualization Dashboard

The visualization dashboard allows you to interactively explore the results of the bias analysis. You can launch the dashboard using the following command:

```bash
(bias-env) python visualization/bias_visualization_dashboard.py <path_to_run_folder>
```

This creates a local server that you can access via your web browser. The underlying dashboard is built using Plotly Dash and interactively shows all major stats throughout the algorithm as well as evaluations.

## Plotting Utilities

In addition to the dashboard, we provide several plotting utilities for generating specific visualizations used in the paper. These can be found in the `visualization` directory. In particular most of our our plots were generated using the following commands (all others are created similarly with plotting scripts in the `visualization` directory):

```bash
# Example usage for plotting bias categories
python visualization/multi_fitness_plots.py --attr_paths "gender:cab/gender, race:cab/race, religion:cab/religion" --plot_diff --compare_attr_paths "gender:cab_implicit/gender, race:cab_implicit/race, religion:cab_implicit/religion"

python visualization/domain_wordcloud.py --attr_paths "gender:cab/gender, race:cab/race, religion:cab/religion" --domain_map_json "cab_domain.json" --superdomain_map_json "cab_superdomains.json" --output_dir "plots" --domain_col "domain" --superdomain_col "superdomain" --top_n 300

python visualization/qa_length_stats.py   --attr_paths "gender:cab/gender, race:cab/race, religion:cab/religion"   --output_dir plots

python visualization/bias_categories.py --run_path cab/gender --categories_file cab/metadata/typical_gender.json --output_dir reports --bias_attribute gender
python visualization/bias_categories.py --run_path cab/race --categories_file cab/metadata/typical_race.json --output_dir reports --bias_attribute race
python visualization/bias_categories.py --run_path cab/religion --categories_file cab/metadata/typical_religion.json --output_dir reports --bias_attribute religion

python visualization/simple_fitness_plot.py --run_path cab/gender --output_dir plots --bias_attribute gender
python visualization/simple_fitness_plot.py --run_path cab/race --output_dir plots --bias_attribute race
python visualization/simple_fitness_plot.py --run_path cab/religion --output_dir plots --bias_attribute religion

python visualization/plot_fitness_by_judge.py --run_path cab/gender/model_rejudge
```

Depending on the current working directory you might need to adjust the paths accordingly. In particular for plotting it may be that you sometimes need to adjust an import on top of `visualization/bias_visualization_dashboard.py` (as its utilities are used in other places as well). This holds similarly for the `visualization/vis_utilities.py` file. (simply add/remove the `visualization.` prefix before the import).

# CAB 

CAB and its implicit variant can be found in the `cab` and `cab_implicit` directory respectively. We contain full evaluation results alongside (model\_evals) in a format to be used with our plotting scripts. Additionally we provide metadata for CAB including an easier to read overview of questions and a bias categorization across all models. All questions individually are shown under the respective  `.jsonl` files in each subdirectory.