# Crafter AI Agent

This repository contains code for running AI agents in the Crafter environment.

## Installation

### conda environment
```bash
conda env create -f environment.yaml
conda activate venv
```

### library

This code is based on crafter environment. 
We employ two additional wrappers: (1) a wrapper that converts the state information into text for LLM-based agents and (2) another one which randomly initialize the status of the agent at the reset.


## Running Experiments

### 1. Collecting Demo

You can collect demonstrations with an LLM-based agent.
The collected data will be stored as CSV logs (for analysis/training) and optionally as GIFs (for visualization).

```bash
python src/data_collection.py
```

### 2. Data Processing

After collecting demo data, you can relabel each trajectory with goal instruction

```bash
python src/goal_relabling.py
```

### 3. Supervised Fine-Tuning (SFT) Phase

Once you have processed your data with goals, you can train a model using supervised fine-tuning:

```bash
python src/train.py
```

**Important**: Make sure to check and modify the configuration before running the training. The configuration specifies important parameters such as:
- Model name
- Training data path (output from the post-processing phase)
- Training hyperparameters (learning rate, batch size, etc.)
- Output directories for checkpoints and logs

The SFT process will train a model to generate both reasoning and actions based on the environmental observations, with the enhanced goal-directed behavior introduced in the post-processing phase.

### 4. Evaluation Phase

After fine-tuning the model, you can evaluate its performance on Crafter tasks using two different approaches:


#### Evaluating All 22 Tasks Simultaneously

Evaluate the model on all 22 achievement tasks in a single environment run:

```bash
python src/eval_benchmark.py
```
This approach evaluates how many of the 22 tasks the model can complete in a single run.

#### Evaluating Each Task Individually

Evaluate the model on each of the 22 tasks separately, focusing on one task at a time:

```bash
python src/eval_each_task.py 
```
This approach tests the model's ability to complete each task in isolation, providing more detailed performance metrics for each specific task.


### 5. Feedback Generation

After running episodes and saving results as CSV logs, you can automatically generate feedback on the agent’s behavior.
This step analyzes the logged trajectories and produces suggestions for improvement.

```bash
python src/feedback.py
```

The feedback generation relies on prompts defined in system_prompts.py.
By incorporating this step, the system forms a closed loop, envisioning a self-evolving system.