# LMTune

LMTune is a Python package for generating Python scripts based on natural language descriptions. It uses large language models (LLMs) to interpret user requirements and generate executable Python code.

## Installation

LMTune uses uv for package management. First, ensure you have uv installed (see https://docs.astral.sh/uv/).

### Setting Up Development Environment

1. **Create a virtual environment**:
```bash
# Create a virtual environment
uv venv
```

2. **Install LMTune in editable mode**:
```bash
# Install the package and all dependencies
uv pip install -e .
```

This installs LMTune in "editable" mode, meaning changes to the source code are immediately reflected without needing to reinstall.

### Running Commands

After installation, use `uv run` to execute LMTune commands:

```bash
# Generate a script
uv run lmtune-agent --problem-folder ~/problems --problem graph-clear

# Run a previously generated script
uv run lmtune-run --problem-folder ~/problems --problem graph-clear
```

The `uv run` prefix ensures the commands run in the correct virtual environment with all dependencies available.

### Updating Dependencies

After modifying `pyproject.toml`:
```bash
# Sync dependencies
uv sync
```

Or if you just want to reinstall/update:
```bash
# Install/update from pyproject.toml
uv pip install -e .
```

## Usage

Generate Python scripts by running the `lmtune-agent` command with a description:

```bash
uv run lmtune-agent "Create a script that downloads and processes JSON data from a REST API"
```

### Running Previously Generated Scripts

LMTune provides a separate command, `lmtune-run`, for running previously generated scripts with consistent input data handling:

```bash
uv run lmtune-run --problem-folder ~/problems --problem graph-clear
```

By default, `lmtune-run` will execute the most recently generated script. You can specify a particular script using the `--script` parameter:

```bash
uv run lmtune-run --problem-folder ~/problems --problem graph-clear --script ~/problems/graph-clear/lmtuner20250521213357.py
```

This guarantees that the script can access the input data and helper functions in a consistent way, even when running outside of the generation context.

### Command Line Options

- `--mc`: Specify the model to use. Default is `AT:claude-sonnet-4-20250514`.
  
  Format: 
  - `OA:model` for OpenAI
  - `AT:model` for Anthropic
  - `OR:provider/model` for OpenRouter
  - `GO:model` for Google
  - `LM:model@url` for local models (experimental)

- `--timeout`: Set execution timeout in seconds for generated scripts (default: 30).
  
  Example model codes:
  ```
  OA:o3-mini-2025-01-31                  # OpenAI o3-mini model
  OA:o3-2025-04-16                       # OpenAI o3 model
  OA:o4-mini-2025-04-16                  # OpenAI o4-mini model
  AT:claude-sonnet-4-20250514            # Anthropic Claude Sonnet 4 model (default)
  AT:claude-3-7-sonnet-20250219          # Anthropic Claude 3.7 Sonnet model
  OR:google/gemini-2.5-pro-preview       # Google Gemini Pro via OpenRouter
  OR:google/gemini-2.5-flash-preview-05-20  # Google Gemini Flash (recommended) via OpenRouter
  LM:mistral@http://localhost:11434      # Local model (experimental) via specified URL
  ```

Examples:

```bash
uv run lmtune-agent "Create a web scraper for news headlines"
```

```bash
uv run lmtune-agent --mc OA:o4-mini-2025-04-16 "Create a web scraper for news headlines"
```

```bash
uv run lmtune-agent --mc AT:claude-3-7-sonnet-20250219 "Create a data visualization tool"
```

```bash
uv run lmtune-agent --mc OA:o3-2025-04-16 "Create a web scraper for news headlines"
```

```bash
uv run lmtune-agent --mc OR:google/gemini-2.5-pro-preview "Create a data visualization tool"
```

```bash
uv run lmtune-agent --timeout 60 "Create a web scraper that processes large datasets"
```

## API Keys

The system requires API keys to use the LLM services. Create a `.env` file in the project root with the following keys:

```
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
OPENROUTER_API_KEY=your_openrouter_key_here
```

## Project Structure

- `/src/lmtune`: Source code
- `/prompts`: System prompts for script generation
- `/memos`: Knowledge base for different script types (created automatically)

## Features

- Generate complete, executable Python scripts from natural language descriptions
- Support for various LLM providers (OpenAI, Anthropic, etc. via OpenRouter)
- Code validation and execution within the system
- Problem-specific templates with placeholder substitution
- Automatic archiving of successful scripts
- Helper functions for standardized file I/O and JSON output
- Error detection and reporting

## Problem-Specific Script Generation

LMTune supports generating scripts for specific problem types with standardized templates. This allows for more consistent and specialized script generation.

### Command Line Options

- `--problem-folder`: Specify the folder containing problem type subfolders (required)
- `--problem`: Specify the problem type subfolder within problem-folder (required)


### Example: Constraint Solver Parameter Tuning

The following examples demonstrate how to generate scripts that recommend optimal parameters for the Chuffed constraint solver for different problem types:

**Graph-Clear Problem:**

```bash
uv run lmtune-agent --problem-folder ~/problems --problem graph-clear --mc OR:google/gemini-2.5-flash-preview-05-20
```

**Vehicle Routing Problem:**

```bash
uv run lmtune-agent --problem-folder ~/problems --problem tiny-cvrp --instance medium_instance_06.json --mc OR:google/gemini-2.5-flash-preview-05-20
```

**Fixed Length Error Correcting Codes (FLECC):**

```bash
uv run lmtune-agent --problem-folder ~/problems --problem FLECC --instance inst_000218e22bc9167b4c9eec3a5ab2cab8.json --mc OR:google/gemini-2.5-flash-preview-05-20
```

#### What happens behind the scenes:

1. **Template Loading**: 
   - The system loads the universal MiniZinc tuning prompt from `/prompts/mzn-tuning.md`
   - It automatically reads the schema from `json-schema.md` in the problem directory (within problem-folder)
   - It automatically finds and includes the first MiniZinc model (*.mzn file) in the problem directory

1. **Problem Analysis**:
   - The generated script loads and parses the JSON instance file
   - It analyzes the problem structure using appropriate metrics (e.g., graph metrics for network problems)
   - It identifies instance-specific characteristics that influence solver behavior

2. **Placeholder Substitution**:
   - `${INSTANCE}` is replaced with "planar_n20_seed2022_14.json"
   - `${SCHEMA}` is filled with the JSON schema content
   - `${MODEL}` is filled with the MiniZinc model code
   
3. **Agent Execution**:
   - The LLM agent generates a Python script based on the prompt
   - The script reads the graph data using the provided helper functions
   - It analyzes the instance and recommends optimal solver parameters
   - The results are output in a standardized JSON format

4. **Results and Archiving**:
   - The script's output is displayed and parsed
   - The successful Python script is archived as `[problem-folder]/[problem]/lmtuner[timestamp].py`

Example output:
```json
{
  "README": "This instance was analyzed by constructing a complete graph from the distance matrix, where nodes represent locations (depot and customers). The graph_density of 1.0 indicates a fully connected network typical of VRP instances. The demand_variance measures heterogeneity in customer demands, affecting route balancing. Spatial_clustering analyzes the geographic distribution of customers based on distances from the depot. Route_flexibility indicates how many different routing options are available given capacity constraints. The capacity_utilization shows how tightly the vehicle capacity constrains the solution space. These parameters help differentiate between instances requiring different solving strategies - high density with low flexibility suggests difficult combinatorial optimization, while clustered instances may benefit from geographic decomposition approaches.",
  
  "graph_density": 1.0,
  "demand_variance": 0.447,
  "spatial_clustering": 0.623,
  "route_flexibility": 4.0,
  "capacity_utilization": 0.85
}
```

### Included Problem Types

LMTune comes with support for several constraint optimization problems:

1. **Graph-Clear**: A vertex cover problem on graphs
2. **CVRP (tiny-cvrp)**: Capacitated Vehicle Routing Problem for logistics optimization
3. **Vehicle Routing Problem (vrp)**: Full-scale VRP with distance matrices and capacity constraints
4. **Mario**: Path optimization problem with fuel constraints and gold collection objectives
5. **FLECC**: Fixed Length Error Correcting Codes problem for constructing optimal error-correcting codes
6. **Car Sequencing**: Production scheduling problem for automotive assembly lines

Each problem directory contains:
- A `json-schema.md` file describing the input format
- A MiniZinc model (*.mzn file) defining the constraint problem
- Sample JSON instance files to test with
- Example files (`example.dzn` and `example.json`) for reference

### Creating New Problem Types

To add a new problem type:

1. Create a directory under your problem folder (specified with `--problem-folder`)
2. Add exactly one MiniZinc model file with the `.mzn` extension (required)
3. Convert DZN files to JSON format (see below)
4. Create a `json-schema.md` file describing the input format (required)
5. Create example files for documentation

All problem types use the universal MiniZinc tuning prompt from `/prompts/mzn-tuning.md` which handles parameter tuning for constraint solvers. The system will automatically find and use the schema file and MiniZinc model.

#### Converting DZN to JSON

LMTune includes a tool to convert MiniZinc data files (*.dzn) to JSON format using the PyMzn library:

```bash
uv run dzn-to-json <input_file_or_directory>
```

To convert all DZN files in a problem directory:
```bash
uv run dzn-to-json problems/vrp/
```

#### Problem Preparation Workflow

For each new problem type, follow these steps:

1. **Convert DZN files to JSON**:
   ```bash
   uv run dzn-to-json problems/<problem-name>/
   ```

2. **Create a json-schema.md file** documenting the JSON format:
   ```markdown
   # JSON Schema for [Problem Name]
   
   ## Schema
   ```json
   {
     "field1": <type>,
     "field2": [<type>, ...],
     ...
   }
   ```
   
   ## Fields
   - **field1**: Description
   - **field2**: Description
   ```

3. **Create example files** for documentation:
   - Create `example.dzn` with a small but complete instance
   - Convert it: `uv run dzn-to-json problems/<problem-name>/example.dzn`
   - This produces `example.json` showing the expected format

#### Example: Vehicle Routing Problem (VRP)

The VRP problem directory contains:
- `vrp.mzn`: MiniZinc model
- `json-schema.md`: JSON format documentation
- `example.dzn` and `example.json`: Small example showing all features
- Multiple instance files in JSON format (converted from DZN)

The JSON format for VRP includes:
```json
{
  "N": 3,
  "Capacity": 50,
  "Demand": [10, 20, 15],
  "Distance": [0, 5, 8, 12, 5, 0, 6, 9, 8, 6, 0, 7, 12, 9, 7, 0]
}
```

Note: 2D arrays (like Distance matrices) are flattened into 1D arrays by PyMzn.

## Versions

### 2025-08-21 (updated)
- Added FLECC (Fixed Length Error Correcting Codes) problem support 
- Created JSON schema documentation for FLECC instances
- Added FLECC example files (example.dzn/json) for reference
- Updated README with FLECC problem type and usage examples
- FLECC problem includes 7,180 instances for comprehensive algorithm selection research

### 2025-06-02 (updated)
- Changed default model from OpenAI o4-mini to Anthropic Claude Sonnet 4 (AT:claude-sonnet-4-20250514)
- Added CLAUDE_SONNET4 to available model codes in config.py
- Enhanced MiniZinc tuning prompt to require structured output with README and 5 instance parameters
- Simplified instance analysis output format for better clarity
- Added Mario problem support with JSON conversions and schema documentation

### 2025-06-02
- Added DZN to JSON converter using PyMzn for accurate MiniZinc data conversion
- Added dzn-to-json command for easy access to conversion tool
- Expanded documentation for problem preparation workflow
- Added example files (example.dzn/json) for VRP problem

### 2025-05-21
- Added `--problem-folder` parameter to support custom problem directories
- Simplified script generation with helper functions for I/O operations
- Added improved import of input_data() and output_results() functions
- Added support for Google Gemini Flash model
- Added new `lmtune-run` command for executing previously generated scripts

### 2025-05-19
- Added universal MiniZinc tuning prompt for all problem types
- Streamlined command line parameters (automatic model file detection)
- Added NumPy support to script dependencies
- Added timeout parameter for script execution
- Added support for Capacitated Vehicle Routing Problem (CVRP)

### 2025-05-18
- First version with basic functionality
- Support for problem-specific templates
- Graph-clear problem type implemented

