# XLogoOnlineMini Benchmark

Welcome to the XLogoOnlineMini Benchmark repository! This project is designed to evaluate large models in program synthesis for visual programming tasks using the XLogoMini platform.

## Project Structure

The project is organized as follows:

- `data`: Contains the datasets for training and evaluation. To obtain this folder, first unzip the `data.zip` file.
- `scripts`: Includes scripts for dataset generation, model training and inference.
- `src`
  - `src/llama_recipes`: Code for fine-tuning large models.
  - `src/xlogomini`: Utility functions for the XLogoMini platform.
  - `src/xlogominidatagen`: Code for generating the synthetic dataset.
  - `src/xlogominiprog`: Code for the program synthesis.


## Installation

To get started, we provide separate installation environments for fine-tuning and inference. For inference, we recommend using [vLLM](https://github.com/vllm-project/vllm), which significantly speeds up the process.

We suggest setting up two separate environments: one for fine-tuning and another for vLLM inference. This is necessary because fine-tuning requires PEFT+FSDP with a PyTorch nightly build, while vLLM requires the stable version of PyTorch.

### Fine-tuning Environment

1. Install PyTorch nightly. Follow the instructions [here](https://pytorch.org/get-started/locally/) to retrieve the correct `--extra-index-url` parameter for your platform.
2. Run the following command to install the requirements:
```bash
pip install -r requirements_ft.txt
```

### vLLM Inference Environment

1. Run the following command to install the requirements for vLLM inference:
```bash
pip install -r requirements_vllm.txt
```


## Task Visualization

You can visualize a XLogoMini task in two ways:

1. **ASCII Representation**:
```python
import json
from src.xlogominiprog.translator import taskjs2ascii, codejs2python

data = json.load(open('./data/xlogomini-dataset-test.json', 'r'))
sample = data[0]

# Print the description of the task
print(sample['task_json']['description'])
# Output: Find only the lemon.

# Print the ASCII representation of the task
print(taskjs2ascii(sample['task_json']))
# Output: 
# +---+---+---+
# | X ‖1RS|   |
# +===+---+---+
# |   | > ‖1YL|
# +===+---+---+
# | X ‖   |   |
# +---+---+---+

# Print the Python solution code of the task
print(codejs2python(sample['code_json']))
# Output:
# def run():
#     turn_right()
#     move_forward()
#     turn_left()
#     move_forward()
#     turn_left()
#     move_forward()
```

2. **Image Representation**:
```python
from src.xlogomini.utils.image_conversions import task2image
task2image(sample['task_json'], show=True, save=False)
```

![task_exmaple.jpg](misc%2Ftask_exmaple.jpg)

## Dataset

### Overview

The dataset is stored in the `data.zip` file. To access the datasets, first unzip this file. Inside, you will find a `data` folder containing the following files:
- Real-world dataset from the XLogoOnline platform:
	- `./data/xlogomini-dataset-test-real.json` (85 samples)
- Synthetic dataset with three splits:
	- `./data/xlogomini-dataset-train.json` (87,053 samples)
	- `./data/xlogomini-dataset-validation.json` (1,000 samples)
	- `./data/xlogomini-dataset-test.json` (1,000 samples)

Each dataset contains the following fields:

- `task_json`: Task description in JSON format.
- `code_json`: Code solution in JSON format.
- `constraints`: Code constraints of the task.
- `ascii`: ASCII representation of the task.


### Dataset Generation

To generate the dataset, run the following script:

```bash
bash scripts/datagen.sh
```

The generated data will be stored in the `./results/datagen` folder. Ensure to use the environment set up with `pip install -r requirement_ft.txt` for generating the dataset.

## Fine-tuning

We provide two fine-tuning methods: standard supervised fine-tuning and emulator-driven fine-tuning.

### Standard Fine-tuning

To fine-tune the Llama3-8B model, run:

```bash
python -m torch.distributed.launch \
  --nnodes 1 \
  --nproc_per_node 4 \
  src/xlogominiprog/finetuning.py \
  --dataset "custom_dataset" \
  --custom_dataset.file "./src/xlogominiprog/custom_dataset.py" \
  --custom_dataset.prompt_template "nl" \
  --model_name "meta-llama/Meta-Llama-3-8B" \
  --use_peft \
  --peft_method lora \
  --enable_fsdp \
  --fsdp_config.pure_bf16 \
  --output_dir "./results/checkpoints/nl/Meta-Llama-3-8B/" \
  --use_fast_kernels \
  --train_config.num_epochs 2 \
  --lora_config.r 32 \
  --lora_config.lora_alpha 128 \
  --train_config.batching_strategy "padding" \
  --train_config.seed 42
```

Alternatively, you can use the provided script:

```bash
bash scripts/finetune.sh
```


The LoRA adaptor checkpoints will be saved in the folder specified by `output_dir`. In the above example, the checkpoints will be saved in `./results/checkpoints/nl/Meta-Llama-3-8B/`.

### Emulator-Driven Fine-tuning

To perform emulator-driven fine-tuning, run:

```bash
bash scripts/finetune_emu.sh
```

> Ensure you specify the number of nodes and GPUs in the `nnodes` and `nproc_per_node` arguments. Also, ensure access to the model (e.g., `meta-llama/Meta-Llama-3-8B`) from Hugging Face.

## Inference

To evaluate a fine-tuned model, run:

```python
python src/xlogominiprog/inference_ft_vllm.py \
  --model_name "meta-llama/Meta-Llama-3-8B" \
  --peft_model "PATH_TO_THE_CHECKPOINT" \
  --top_p 1 \
  --temperature 0 \
  --dataset_path "./data/xlogomini-dataset-test-real.json"
```

> Replace `PATH_TO_THE_CHECKPOINT` with the path to the fine-tuned model checkpoint (e.g., `./results/checkpoints/nl/Meta-Llama-3-8B/epoch_1`).

You can also use the provided script for inference:

```bash
bash scripts/inference_vllm_ft.sh
```

## Evaluation

We use `success_rate` as the main evaluation metric. You can evaluate the model using the following code:

```python
from src.xlogominiprog.evaluate import eval_model_parallel

# File path to the inference results
file = "./results/inference/nl/Meta-Llama-3-8B/epoch_1_xlogomini-dataset-test-real.json"
results, summary = eval_model_parallel(file)  

# Print the success rate
print(summary['success_rate'])

# Detailed results
print(summary)
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.


The fine-tuning part of this project is built with [llama-recipes](https://github.com/meta-llama/llama-recipes).