We provide the code for both
(1) task generation from raw datasets
(2) evaluation on MLE-Smith evaluation set (50 tasks mentioned in experiments) 

# Task Generation and Verification

## Setup conda environment
```bash
conda create -n mle-smith python=3.11
conda activate mle-smith
pip install uv
uv pip install -r requirements.txt
```

## Setup Kaggle account
Go to your Kaggle account settings, generate a new API token, and move the downloaded kaggle.json file to `~/.kaggle/kaggle.json` on Linux/macOS or `C:\Users\<YourUsername>\.kaggle\kaggle.json` on Windows. Refer to [Kaggle API](https://www.kaggle.com/docs/api) for details.


## Setup API key
Create a `.env` file and write `OPENAI_API_KEY=`, other APIs and models can also apply.



## Specify the datasets to generate in mle-list.txt
We provide the list of datasets MLE-Smith evaluation tasks generated on.
Any dataset can be put into the list.



## Run the script to generate tasks from raw datasets:
```bash
python run.py
```
More settings can be modified within the script.


## Execution-based Verification
Setup MLE-Dojo Env
```bash
cd MLE-Dojo
uv pip install -e .
cd ..
python dojo_test.py --competition_name 200-bird-species-with-11788-images_refact1 --refact_index 1
```



# Evaluation on Smith Set

## Prepare the data
```bash
python prepare.py
```

## Build docker image
```bash
cd MLE-Dojo/
docker build -t mle-dojo-smith .
```

## Run evaluation
Setup up config in ./MLE-Dojo/agent_configs/*.yaml, fill the api key part, then run:
```bash
bash run_api.sh
```






