# Artifact for Reflexion

This is the artifact for Reflexion. The artifact is divided into four sections - each with their own folder.

## Requirements and Preliminary Setup
Every run requires an OpenAI API key to be set as an environment variable.
```bash
export OPENAI_API_KEY=<api key>
```

Individual sections may require additional setup (for example, the `cargo` for the Rust programming agent).
These additional dependencies are enlisted below.

## AlfWorld (decision-making)

Move into the AlfWorld directory and run `make`

```bash
cd ./alfworld_runs && make
```

To run the simple run (baseline)

```bash
./run_simple.sh
```
The results will be logged in `./root/simple_run_logs`

To run the Reflexion run

```bash
./run_reflexion.sh
```

The results will be logged in `./root/reflexion_run_logs`

We include all of the log files from the experiments in the paper in `./root`

## WebShop (decision-making) *only featured in supplementary information

Move into the WebShop directory and run `make`

```bash
cd ./webshop_runs && make
```

To run the simple run (baseline)

```bash
./run_simple.sh
```
The results will be logged in `./root/simple_run_logs`

To run the Reflexion run

```bash
./run_reflexion.sh
```

The results will be logged in `./root/reflexion_run_logs`

We include all of the log files from the experiments in the paper in `./root`

## HotPotQA (reasoning)

Move into the HotPotQA directory and run `make`

```bash
cd ./hotpotqa_runs && make
```

Each notebook in `./notebooks` allows you to specify the reflexion strategy to be used by the agents. The available reflexion strategies, which are defined in an `Enum`, include:

 - `ReflexionStrategy.NONE` - The agent is not given any information about its last attempt. 

 - `ReflexionStrategy.LAST_ATTEMPT` - The agent is given its reasoning trace from its last attempt on the question as context.

 - `ReflexionStrategy.REFLEXION` - The agent is given its self-reflection on the last attempt as context. 

 - `ReflexionStrategy.LAST_ATTEMPT_AND_REFLEXION` -  The agent is given both its reasoning trace and self-reflection on the last attempt as context.

We include all of the log files from the experiments in the paper in `./root`

## HumanEval, MBPP, LeetCode Hard (programming)

Move into the programming directory and run `make`

```bash
cd ./programming_runs && make
```

To run the HumanEval and MBPP simple runs (baselines):

```bash
./run_simple.sh
```

To run the HumanEval and MBPP Reflexion runs:

```bash
./run_reflexion.sh
```

To run the LeetCode Hard simple runs:

```bash
./run_simple_py_leet.sh
```

```bash
./run_simple_rs_leet.sh
```

To run the LeetCode Hard Reflexion runs:

```bash
./run_reflexion_py_leet.sh
```

```bash
./run_reflexion_rs_leet.sh
```

To run the ablation experiments:

```bash
./run_immediate_refinement.sh
```

```bash
./run_immediate_reflexion.sh
```

We include all of the log files from the experiments in the paper in `./root`
