# Data2Decision: Database-Grounded Prescriptive Analytics System

## Overview
Data2Decision is a prescriptive analytics system that bridges enterprise databases and optimal decision-making through a novel two-stage pipeline with test-time scaling.

## Running Experiments

### Main Method: Data2Decision

### Implementation Directory
```
baselines/
├── majority_2_stage_mul_solver/               # Data2Decision main implementation
├── majority_2_stage_sing_solver/              # Data2Decision single solver variant  
├── majority_3_stage_mul_solver/               # Data2Decision three-stage variant
```

### Core Implementation

The main Data2Decision system is in `majority_2_stage_mul_solver/` with two API versions:
- `integrated_optimizer_large.py` - Uses DeepSeek native API (recommended for reproduction)
- `integrated_optimizer.py` - Uses RITS internal API

### Running Data2Decision

```bash
cd baselines/majority_2_stage_mul_solver

python integrated_optimizer_large.py \
  --syn_data_dir /path/to/text2opt_dataset \
  --output_dir ../majority_two_stage_mul_solver_results_10_005_005_000_002 \
  --num_attempts 10 \
  --sql_temp_base 0.05 \
  --sql_temp_increment 0.05 \
  --code_temp_base 0.0 \
  --code_temp_increment 0.02 \
  --max_workers 30
```

**Key Parameters:**
- `num_attempts`: Number of parallel attempts with majority voting (10 is optimal)
- `sql_temp_base/increment`: Temperature progression for SQL diversity (0.05→0.50)
- `code_temp_base/increment`: Temperature progression for code diversity (0.00→0.18)
- `max_workers`: Parallel processing of different problems

### Variants

#### Single Solver Variant
```bash
cd baselines/majority_2_stage_sing_solver

# Uses only Gurobi instead of cycling through multiple solvers
python integrated_optimizer.py \
  --syn_data_dir /path/to/text2opt_dataset \
  --output_dir ../majority_two_stage_sing_solver_results_10_005_005_000_002 \
  --num_attempts 10 \
  --sql_temp_base 0.05 \
  --sql_temp_increment 0.05 \
  --code_temp_base 0.0 \
  --code_temp_increment 0.02
```

#### Three-Stage Variant
```bash
cd baselines/majority_3_stage_mul_solver

# Includes intermediate mathematical modeling stage
python integrated_optimizer_three_stage.py \
  --syn_data_dir /path/to/text2opt_dataset \
  --output_dir ../majority_three_stage_mul_solver_results_10_005_005_000_002 \
  --num_attempts 10 \
  --sql_temp_base 0.05 \
  --sql_temp_increment 0.05 \
  --code_temp_base 0.0 \
  --code_temp_increment 0.02
```

### API Configuration

Before running, configure your API key in the respective script:

```python
# For integrated_optimizer_large.py (DeepSeek API)
DEEPSEEK_API_KEY = ""  # Add your DeepSeek API key

# For integrated_optimizer.py (RITS API)  
RITS_API_KEY = ""  # Add your RITS API key
```

### Output Format

All variants produce the same standardized output structure:
```
results_directory/
├── problem_name/
│   ├── code_output.txt         # Final result with majority vote
│   ├── summary.json            # Detailed execution summary
│   └── attempt_*/              # Individual attempt details
└── overall_summary.json        # Complete run statistics
```


# Baselines Directory Structure and Execution Guide

## Main Method: Data2Decision
```
├── majority_2_stage_mul_solver/         # Main Data2Decision implementation
├── majority_two_stage_mul_solver_results_10_005_005_000_002/  # Optimal configuration (69.5% accuracy)
```

Note: In evaluation scripts, this method may be referred to as "Hierarchical-Graph-Agent" for historical reasons, but it is actually the Data2Decision system described in this paper.

## Execution Pipeline
# Stage 1: SQL Data Retrieval for Text-to-OPT Baselines

## Overview
`stage1_sql_retrieval.py` is a preprocessing script that enhances optimization problems by extracting relevant data from databases using LLM-generated SQL queries.

## Purpose
Since Text-to-OPT baseline methods cannot extract data from databases independently, this script provides a unified first stage that:
1. Analyzes business requirements from problem descriptions
2. Generates SQL queries using LLM (DeepSeek-V3)
3. Executes queries against SQLite databases
4. Creates enhanced problem descriptions with retrieved data

## Input/Output

### Input Structure
```
text2opt_dataset_alternating_optimization/
├── activity_1/
│   ├── problem_description.md
│   └── schema_cache/latest/
│       ├── schema.sql
│       └── data.sql
├── concert_singer/
│   └── ...
```

### Output Structure
```
stage_1_enhanced_problems/
├── activity_1/
│   ├── enhanced_problem_description.md  # Problem + Retrieved data
│   ├── stage1_llm_sql_generation.txt    # LLM reasoning log
│   └── stage1_results.json              # Execution summary
├── concert_singer/
│   └── ...
└── stage1_summary.json                  # Overall processing stats
```

## Usage
```bash
python stage1_sql_retrieval.py \
  --syn_data_dir /path/to/text2opt_dataset_alternating_optimization \
  --output_dir ./stage_1_enhanced_problems \
  --max_workers 8 \
  --max_problems 120
```


### Step 2: Run Text-to-OPT Baseline Methods
After Stage 1, run each Text-to-OPT method on enhanced problems.

#### Example: Running OR-LLM-Agent

**1. Set API Key**

Edit `baselines/or_llm_agent/or_llm_eval.py`:
```python
def setup_rits_api_client():
    os.environ['RITS_API_KEY'] = ''  # <-- Add your RITS API Key here
```

Or create `.env.rits` file with your API configuration.

**2. Execute OR-LLM-Agent**
```bash
cd baselines/or_llm_agent

python stage2_or_llm_agent.py \
  --enhanced_problems_dir ../stage_1_enhanced_problems \
  --output_dir ../or_llm_agent_two_stage_results \
  --agent \
  --model "DeepSeek-V3" \
  --max_workers 4
```

The `--agent` flag enables the full OR-LLM-Agent pipeline with mathematical modeling. Without it, the system uses simple direct code generation.

Results will be saved in `or_llm_agent_two_stage_results/`

#### Example: Running OptiMUS

**1. Set API Key**

Edit `baselines/OptiMUS/utils.py`:
```python
def setup_paste_api_client():
    os.environ['RITS_API_KEY'] = ''  # <-- Add your RITS API Key here
```

**2. Execute OptiMUS**
```bash
cd baselines/OptiMUS

python stage2_optimus.py \
  --enhanced_problems_dir ../stage_1_enhanced_problems \
  --output_dir ../optimus_two_stage_results \
  --model "deepseek-ai/DeepSeek-V3" \
  --error_correction \
  --max_workers 8
```

Results will be saved in `optimus_two_stage_results/`

#### Example: Running Simple Zero-Shot

Simple Zero-Shot is the most straightforward baseline that directly prompts the LLM to generate Gurobi code without intermediate steps.

**1. Set API Key**

Edit `baselines/simple_zero_shot/simple_zero_shot_solver.py`:
```python
def setup_api_client(self):
    os.environ['RITS_API_KEY'] = ''  # <-- Add your RITS API Key here
```

**2. Execute Simple Zero-Shot**
```bash
cd baselines/simple_zero_shot

python stage2_simple_zero_shot.py \
  --enhanced_problems_dir ../stage_1_enhanced_problems \
  --output_dir ../simple_zero_shot_two_stage_results \
  --model "DeepSeek-V3" \
  --temperature 0.1 \
  --max_workers 8
```

Results will be saved in `simple_zero_shot_two_stage_results/`

This method uses a single LLM call to generate complete Gurobi code, making it faster but potentially less accurate than multi-step approaches like OR-LLM-Agent or OptiMUS.

#### Example: Running Chain-of-Experts

Chain-of-Experts uses a multi-expert collaboration system where different domain experts work together to solve optimization problems.

**1. Set API Keys**

Edit the following files to add your API key:
- `baselines/Chain-of-Experts/conductor.py`
- `baselines/Chain-of-Experts/reducer.py`
- `baselines/Chain-of-Experts/experts/base_expert.py`

```python
default_headers={"RITS_API_KEY": ""}  # <-- Add your RITS API Key here
```

**2. Execute Chain-of-Experts**
```bash
cd baselines/Chain-of-Experts

python stage2_chain_of_experts.py \
  --enhanced_problems_dir ../stage_1_enhanced_problems \
  --output_dir ../chain_of_experts_two_stage_results \
  --model "deepseek-ai/DeepSeek-V3" \
  --max_collaborate_nums 3 \
  --enable_reflection \
  --max_workers 4
```

Results will be saved in `chain_of_experts_two_stage_results/`

The system orchestrates multiple domain experts (modeling, programming, code review, etc.) through a conductor, then uses a reducer to synthesize their insights into the final solution.

### Step 3: Run End-to-End Foundation Models

Foundation models handle both SQL extraction and optimization independently, without requiring Stage 1 preprocessing.

#### Unified Baseline Runner

The `baseline_call_method/baseline.py` provides a unified interface for testing all foundation models:

**1. Set API Key**

Edit `baselines/baseline_call_method/baseline.py`:
```python
RITS_API_KEY = ""  # <-- Add your RITS API Key here
```

**2. Run Different Foundation Models**

```bash
cd baselines/baseline_call_method

# Llama-3.3-70B (Best performer: 53.7%/47.2%)
python baseline.py \
  --syn_data_dir /path/to/text2opt_dataset \
  --output_dir ../baseline_llama33 \
  --model llama-3-3-70b \
  --max_workers 2

# Qwen2.5-72B  
python baseline.py \
  --syn_data_dir /path/to/text2opt_dataset \
  --output_dir ../baseline_qwen \
  --model qwen2-5-72b \
  --max_workers 2

# Phi-4
python baseline.py \
  --syn_data_dir /path/to/text2opt_dataset \
  --output_dir ../baseline_phi4 \
  --model phi-4 \
  --max_workers 2

# Llama-4-Scout
python baseline.py \
  --syn_data_dir /path/to/text2opt_dataset \
  --output_dir ../baseline_llama4 \
  --model llama-4-scout \
  --max_workers 2
```

Results are saved in respective directories with standardized `code_output.txt` format for evaluation.

**Note**: These models work directly with the original dataset and don't require the enhanced problems from Stage 1, making them true end-to-end solutions.

### Step 4: Ablation Studies

#### Architecture Ablations
```
├── abl_two_vs_three_stage/             # Two-stage vs three-stage pipeline comparison
├── abl_single_vs_multi_solver/         # Single vs multi-solver consensus comparison
├── abl_temp_strategy_three_way/        # Temperature strategy comparison
```

Run ablation comparisons:
```bash
# Single vs Multi-solver
python comprehensive_baseline_evaluation_flex.py \
  --b1_name "single_solver" --b1_folder majority_2_stage_sing_solver \
  --b2_name "multi_solver" --b2_folder majority_2_stage_mul_solver \
  --output_dir abl_single_vs_multi_solver
```

#### Parameter Analysis (Test-Time Scaling)
Test different numbers of attempts to analyze the scaling law:

```bash
cd baselines/majority_2_stage_mul_solver

# Run with different attempt counts
for attempts in 1 3 6 10 12 24; do
  python integrated_optimizer_large.py \
    --syn_data_dir /path/to/text2opt_dataset \
    --output_dir ../majority_two_stage_mul_solver_results_${attempts}_005_005_000_002 \
    --num_attempts ${attempts} \
    --sql_temp_base 0.05 \
    --sql_temp_increment 0.05 \
    --code_temp_base 0.0 \
    --code_temp_increment 0.02
done
```

Temperature parameter analysis:
```bash
# Fixed low temperature
python integrated_optimizer_large.py \
  --output_dir ../majority_two_stage_mul_solver_results_10_001_0_001_0 \
  --sql_temp_base 0.01 \
  --sql_temp_increment 0.0 \
  --code_temp_base 0.01 \
  --code_temp_increment 0.0

# Incremental temperature (optimal)
python integrated_optimizer_large.py \
  --output_dir ../majority_two_stage_mul_solver_results_10_005_005_000_002 \
  --sql_temp_base 0.05 \
  --sql_temp_increment 0.05 \
  --code_temp_base 0.0 \
  --code_temp_increment 0.02
```

### Step 5: Comprehensive Evaluation

Evaluate all methods together using the fixed evaluation script:

```bash
python comprehensive_baseline_evaluation.py \
  --synthetic_dir /path/to/text2opt_dataset \
  --or_llm_agent_dir or_llm_agent_two_stage_results \
  --optimus_dir optimus_two_stage_results \
  --simple_zero_shot_dir simple_zero_shot_two_stage_results \
  --chain_of_experts_dir chain_of_experts_two_stage_results \
  --hierarchical_graph_agent_dir majority_two_stage_mul_solver_results_10_005_005_000_002 \
  --output_dir comprehensive_evaluation_results \
  --auto_proceed
```

**Note**: The `--hierarchical_graph_agent_dir` parameter refers to our Data2Decision method. This naming is kept for backward compatibility with the evaluation scripts.

For flexible comparisons (e.g., ablation studies), use:

```bash
python comprehensive_baseline_evaluation_flex.py \
  --b1_name "Method-A" --b1_folder path/to/method_a_results \
  --b2_name "Method-B" --b2_folder path/to/method_b_results \
  --output_dir comparison_results \
  --auto_proceed
```

## Result Naming Convention
`majority_[stages]_stage_[solver]_results_[attempts]_[sql_temp_base]_[sql_inc]_[code_temp_base]_[code_inc]`

Example: `majority_two_stage_mul_solver_results_10_005_005_000_002`
- Two-stage pipeline
- Multi-solver consensus
- 10 attempts
- SQL temperature: 0.05 base + 0.05 increment
- Code temperature: 0.00 base + 0.02 increment

## Evaluation Scripts
```
├── comprehensive_baseline_evaluation.py      # Evaluate all Text-to-OPT methods (fixed valid cases)
├── comprehensive_baseline_evaluation_flex.py # Flexible comparison tool
```

### Key Difference Between Evaluation Scripts

- **comprehensive_baseline_evaluation.py**: Uses FIXED valid cases calculation - valid cases depend only on synthetic dataset ground truth quality, ensuring consistent comparison across different baseline configurations
- **comprehensive_baseline_evaluation_flex.py**: Flexible script for comparing any subset of methods, useful for ablation studies

## Summary Files
```
├── comprehensive_evaluation_results/    # Final comparison metrics
├── eval_four_models/                   # Foundation model comparison results
└── stage_1_enhanced_problems/stage1_summary.json  # Stage 1 statistics
```

# Repository Structure Overview

```
baselines/
├── # Method Implementations
│   ├── majority_2_stage_mul_solver/               # Data2Decision main implementation
│   ├── majority_2_stage_sing_solver/              # Data2Decision single solver variant
│   ├── majority_3_stage_mul_solver/               # Data2Decision three-stage variant
│   ├── or_llm_agent/                              # OR-LLM-Agent implementation
│   ├── OptiMUS/                                   # OptiMUS implementation
│   ├── simple_zero_shot/                          # Simple Zero-Shot implementation
│   ├── Chain-of-Experts/                          # Chain-of-Experts implementation
│   ├── baseline_call_method/                      # Unified foundation model runner
│   │
├── # Evaluation Infrastructure
│   ├── stage1_sql_retrieval.py                    # Stage 1 preprocessing script
│   ├── comprehensive_baseline_evaluation.py       # Main evaluation

