# LM Simulation Benchmark

This repository was used to run the experiments for the paper ALMANACS: A Simulatability Benchmark for Language Model Explainability.

## Setup

Install with `pip install -e .`.

If you are using OpenAI models, set your API key with `export OPENAI_API_KEY=<api key>`.

## Overview

This repo uses hydra for config, with config files found in `\config`.

### Tasks

The tasks in the ALMANACS benchmark are `aita`, `advanced_ai_risk`, `harmful_requests`, `hiring_decisions`, `llm_goals`, `misinformation`, `moral_dilemmas`, `power_seeking`, `self_preservation`, `situational_awareness`, `strange_hypotheticals`, and `sycophancy`. Templates for other topics can be generated by providing example questions in `task_data.json` and creating a task config in `config/tasks`.

### Models

Experiments for the ALMANACS paper were run with `flan-alpaca-gpt4-xl` and `vicuna-7b-1.3`.

Experiments with other models can be run by adding model configs to the `config/models` directory. Huggingface language models can be added easily with small modifications to `lm_understanding/models/hf_model.py`.

### Baselines

Code for the baseline methods can be found in `lm_understanding/baselines`.

### Directory Structure

- `lm_understanding/` is the primary code directory.
- `templates/` contains the full set of templates generated by GPT-4.
- `baseline_results/` contains results from evaluating methods for predicting model behavior.

Note that the datasets, model behavior, and explanations can be generated with `python create_dataset.py`, `python model_behavior.py`, and `python create_explanations.py` respectively. These will generate a `datasets/` directory, which contains the filtered datasets of questions over which we evaluated model behavior, and metadata about the template filtering run; and a `model_behavior_results/` directory, which contains the ALMANACS datasets of questions and model behavior for those questions. We have omitted these directories in this version to adhere to the maximum file size for the Supplementary Material.

## Dataset Creation

### Template Creation

GPT-4 is used to generate question templates from `create_templates.py`.

### Model-Specific Dataset Selection

Create a model-specific dataset with `python create_dataset.py`, with config controlled by `config/create_dataset.yaml`. This will create a dataset of questions adversarially filtered for baseline performance on predicting the given model.

Available models can be seen in `config/models`

### Model Behavior Collection

After generating a dataset of questions, model behavior can be measured using `model_behavior.py`.

Behavior for a synthetic linear model can be generated with `scripts/synthetic_data/synthetic_linear_model.py`.

## Method Evaluation

To evaluate a method, use `run_baseline.py`. Baseline config can be controlled via `config/run_baseline.yaml` and the baseline config files in `config/baseline`.

Explanations can be generated using `create_explanations.py`, controlled by the config `config/create_explanations.yaml`. Salience-based explanations can be generated using the scripts in `scripts/salience`.

## Other Scripts

Other helpful scripts can be found in the `scripts` dir, including those for making plots and running model capabilities evaluations.
