# MiniZinc to NetworkX Graph Extraction System

## Overview
A system for converting MiniZinc constraint problems (MZN + DZN files) into NetworkX graphs for machine learning-based algorithm selection and parameter tuning.

## Problem Statement
- We have numerous MiniZinc problems in the `problem_filtered/` directory
- Each problem has one `.mzn` model file and multiple `.dzn` data instance files
- Goal: Extract structured graphs that capture problem difficulty for ML analysis
- Challenge: Each problem type has unique semantics that affect solving difficulty

## Solution Architecture

### 1. Uniform Graph Schema
All problems map to the same simple graph structure:

**Graph Type**: `nx.Graph()` (undirected)

**Node Attributes**:
- `type`: int (0=variable-like, 1=constraint-like, 2=resource-like)
- `weight`: float [0,1] (importance/difficulty)

**Edge Attributes**:
- `weight`: float [0,1] (strength/tightness of relationship)

### 2. Subagent-Based Conversion
Each problem type gets a custom converter written by an LLM subagent:

**Input to Subagent**:
- MZN and DZN files for the problem
- `subagent_final_prompt.md` with instructions and schema
- Examples of good graph representations

**Output from Subagent**:
- `converter.py` - Custom Python script that:
  - Receives JSON data (pre-converted from DZN)
  - Understands problem semantics
  - Builds graph capturing problem structure
  - Applies domain-specific weights

### 3. Feature Extraction Pipeline
Once graphs are created:

1. **Validation** (`validate_graph.py`):
   - Checks schema conformance
   - Ensures weights are in [0,1]
   - Verifies required attributes

2. **Feature Extraction** (`feature_extractor.py`):
   - Computes 40+ graph features:
     - Structural: density, degree distribution, clustering
     - Component: connectivity, modularity
     - Weight-based: average weights, variance
   - Returns feature vector for ML

## Key Components

### Core Files
- `subagent_final_prompt.md` - Instructions for subagents to create converters
- `feature_extractor.py` - Extracts ML features from graphs  
- `validate_graph.py` - Validates graph conformance
- `base_converter.py` - Helper functions for parsing (optional use)

### Per-Problem Files (created by subagent)
- `problem_filtered/<problem>/converter.py` - Custom converter for that problem type

## Workflow

```
1. For each problem directory:
   │
   ├─> Subagent analyzes MZN/DZN files
   │
   ├─> Subagent writes converter.py
   │
   └─> For each DZN instance:
       │
       ├─> Run: python converter.py problem.mzn instance.dzn
       │
       ├─> Outputs: instance.gpickle (graph file)
       │
       ├─> Validate: python validate_graph.py instance.gpickle
       │
       └─> Extract: features = feature_extractor.extract(graph)
```

## Why This Approach?

### Advantages
1. **Semantic Understanding**: Subagents understand what makes each problem hard
2. **Flexibility**: Each problem gets custom treatment while maintaining uniform output
3. **Simplicity**: Simple schema is easy to validate and analyze
4. **Scalability**: Can process hundreds of problem types in parallel
5. **Domain Expertise**: Subagents encode knowledge about problem structure

### Key Insights
- **Don't parse everything**: Focus on what affects solving difficulty
- **Abstract representation**: Capture interaction patterns, not syntactic details  
- **Normalized weights**: Everything in [0,1] for ML compatibility
- **Problem-specific logic**: Each converter uses domain knowledge

## Example: Costas Array Problem

**Input**: n=16 (single parameter)

**Subagent Understanding**:
- n positions need distinct values
- Distance constraints at each separation
- All-different constraint is critical

**Graph Created**:
- Position nodes (type=0, weight by centrality)
- Distance constraint nodes (type=1, weight by tightness)
- Edges show which positions interact through constraints

## Usage

### Running a Converter
```bash
cd problem_filtered/CostasArray
python converter.py CostasArray.mzn 16.dzn
# Creates: 16.gpickle
```

### Validating Output
```bash
python ../../validate_graph.py 16.gpickle
# Checks schema conformance
```

### Extracting Features
```python
from feature_extractor import GraphFeatureExtractor
import networkx as nx

G = nx.read_gpickle('16.gpickle')
extractor = GraphFeatureExtractor(G)
features = extractor.extract_all_features()
# Returns dict with 40+ features
```

## Alternative Approaches Considered

1. **FlatZinc parsing**: Too low-level, loses semantic meaning
2. **Direct MiniZinc parsing**: Too complex, requires full CP system
3. **Fixed templates**: Too rigid, can't capture problem-specific insights
4. **Manual conversion**: Not scalable to hundreds of problems

## Conclusion

This system leverages LLM subagents to provide semantic understanding while maintaining a uniform output format suitable for machine learning. Each problem gets custom treatment based on its unique characteristics, but all produce graphs following the same simple schema.