# Schema2OptSGD

**Database Schema to Optimization Problem Generator with Alternating Optimization**

This repository automatically generates optimization problems from database schemas using an alternating optimization approach. OR experts and Data engineers iteratively collaborate to create Linear Programming (LP) and Mixed-Integer Programming (MIP) formulations, with multi-solver validation and realistic data generation.

## Usage

Add your RITS API key in `config.py` for Llama-3.3-70B access.

### Basic Usage

Generate optimization problem from single database:
```bash
python -m main --database "flight_2" --spider-dir "../spider" 
```

Process multiple databases in parallel:
```bash
python parallel_runner.py --parallel 120 --spider-dir "../spider" 
```

List available databases:
```bash
python parallel_runner.py --list-databases --spider-dir "../spider"
```

## Configuration

Key parameters in `config.py`:

```python
MAX_ALTERNATING_ITERATIONS = 5      # OR Expert ↔ Data Engineer iterations
MAX_TABLES = 5                      # Database complexity limit  
SOLVER_TIMEOUT = 500                # Execution timeout per solver
VERIFICATION_THRESHOLD = 0.99       # Consistency score requirement
```

## Repository Structure

```
schema2optsgd/
├── main.py                      # Single database processing
├── parallel_runner.py           # Parallel processing orchestrator
├── core_generation.py           # Alternating optimization algorithm
├── verification.py             # Mathematical verification system
├── solver_execution.py         # Multi-solver execution engine
├── api_client.py               # LLM API communication
├── prompts.py                   # Prompt generation functions
├── file_manager.py             # File operations and state management
├── utils.py                    # Utility functions
├── config.py                   # Configuration parameters
└── templates/                  # Prompt and code templates
    ├── or_expert_initial.txt
    ├── data_engineer.txt
    ├── mathematical_solution.txt
    └── solver_templates/
        ├── gurobipy_template.py
        ├── docplex_template.py
        └── pyomo_template.py
```

## Output Structure

Each processed database generates:

```./text2opt_dataset_alternating_optimization/database_id/
├── problem_solution_description.md    # Complete 8-section documentation
├── mathematical_solution.md           # Sections 4-8 (formulation + solvers)
├── or_analysis.json                   # Final optimization analysis
├── solver_execution_results.json      # Multi-solver summary
├── gurobipy_code.py                   # Generated solver implementations
├── docplex_code.py
├── pyomo_code.py
├── debug_prompts/                     # All LLM interactions
├── solver_logs/                       # Execution details
└── logs/                             # System logs
```