<h1 align='center' style="text-align:center; font-weight:bold; font-size:2.0em;letter-spacing:2.0px;"> Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates </h1>

# Quick Start

1. run this [notebook](notebook_gpt4/gpt-4-1106-preview_vs_nil.ipynb) to get the optimized adversarial string. 

2. run the [01_prepare_submission.ipynb](./01_prepare_submission.ipynb) to craft the null model submission based on the above optimized adversarial string.

# Evaluation

To install the stable release of AlpacaEval 2.0, run

```bash
pip install alpaca-eval
```

To install the nightly version, run

```bash
pip install git+https://github.com/tatsu-lab/alpaca_eval
```

Then you can use it as follows:

```bash
export OPENAI_API_KEY=<your_api_key> # for more complex configs, e.g. using Azure or switching clients see client_configs/README.md 
alpaca_eval --model_outputs 'example/outputs.json' 
```