# How to run
First you need to start a server with small LLMs.
To speed up, we use parallel execution and sglang.
see the file `run.sh`

# Generalization
You might notice there are a bunch of workflow files in workflow folder, actually there are only two files: one for Python, another is for non-Python.
The differences between all these files for different tasks are just how they instruct the format of the answers. I produced so many files just because of my habit of easy unroll. Will combine all of them when I open source the code.

Overall, my code is expected to be generalizable.

# Models
Llama3.1-8b, Qwen2.5-7b

# benchmarks
Game of 24, Python Puzzles, Word Sorting, Checkmate in One, Sonnet writing. Feel free to try others.
