# Project MiniBossLevel

This project is designed to create a synthetic dataset for training and evaluating large language models (LLMs) on planning and error correction tasks. The dataset consists of pairs of "correct" and "mistake-injected" plans, which are used to fine-tune an LLM to identify and correct errors.


## 💾 Data Creation

To create the training data, follow these steps in order after installing all requirements.

### Step 1: Create Execution Plans

First, generate the core execution plans.

1.  Run the `execution_creation.py` script.
    ```bash
    python execution_creation.py
    ```
2.  After the script finishes, a new file named `MiniBossLevel_mistake_jsonl` will be created in the `data` directory. This file contains the initial plans along with their corresponding mistake-injected versions.

### Step 2: Process the Data

Next, process the data to prepare it for training.

1.  Run the `execution_data_handle.py` script.
2.  You can choose to run it with the `step_by_step` flag to enable step-by-step encoding you have to change it in the python code.
    ```bash
    python mistake_creation.py
    ```
3.  Once this script completes, the processed training data will be available in the `data` folder as `MiniBossLevel_processed.jsonl` (for full plans) or `MiniBossLevel_mistake_processed.jsonl` (for mistake-injected plans).

This processed data is now ready for fine-tuning your LLM.

## 🧪 Evaluation

To evaluate your fine-tuned LLM, you need to set up the server and then run one of the evaluation scripts.

### Step 1: Set Up the LLM Server

Make sure your fine-tuned LLM server is accessible.

-   **Forward your LLM server's port** to `localhost:8009`.
-   Alternatively, you can change the `ClientURL` variable in the `my_llm.py` script to point to your server's address.

> **Note:** If your model is behind an authenticator proxy, you might need to add your authentication method within the `my_llm.py` script.

### Step 2: Run the Evaluation

Choose the appropriate evaluation script based on your training data.

-   For **step-by-step** evaluation, run the `step_by_step_eval.py` script.
    ```bash
    python step_by_step_eval.py
    ```
-   For **full planning** evaluation, run the `eval.py` script.
    ```bash
    python eval.py
    ```