# Subagent Task: Create Graph Converter for MiniZinc Problem
**Prompt Version: v_02**

## Your Task
Create a Python script that converts MiniZinc problem instances into NetworkX graphs suitable for machine learning-based algorithm selection.

**⚠️ IMPORTANT: Testing is MANDATORY! Your converter MUST be tested with JSON input before submission. See "Testing Your Converter" section below.**

## Context
You are in a folder containing:
- One `.mzn` file: The problem model
- Multiple `.dzn` files: Different problem instances
- **JSON data files**: Pre-converted DZN data in JSON format (same name, `.json` extension)

Your script will receive:
1. Path to MZN file (for reference only - you may not need to read it)
2. JSON data dictionary (already parsed from DZN files)

The script must convert the JSON data into a graph that captures the problem's essential structure for predicting solver performance. The graph is built on-the-fly during feature extraction - no intermediate files are saved.

**You will test your converter using the JSON files that exist in the directory.**

## How Your Graph Will Be Used
The graph you create will be analyzed by sophisticated feature extraction algorithms that measure:
- Graph-theoretic properties (connectivity, centrality, clustering)
- Constraint structure (tightness, overlap, propagation)
- Statistical distributions of weights and degrees
- Community and modularity patterns
- Bottlenecks and critical components

**Key principle: Create rich, informative graphs that expose problem structure**
- Different node types enable type-specific analysis
- Varied weights enable statistical feature extraction
- Proper constraint modeling enables constraint-based features
- The richer your graph representation, the more patterns ML can discover

## Required Graph Schema

### Graph Type
```python
import networkx as nx
G = nx.Graph()  # Undirected graph
```

### Node Attributes (keep it simple)
```python
G.add_node(node_id,
    type=0,        # 0=variable-like, 1=constraint-like, 2=resource-like
    weight=0.7     # Importance/difficulty [0,1]
)
```

### Edge Attributes  
```python
G.add_edge(node1, node2,
    weight=0.8     # Strength/tightness of relationship [0,1]
)
```

## Your Script Structure

Create a file named `converter.py`:

```python
#!/usr/bin/env python3
"""
Graph converter for [Problem Name] problem.
Created using subagent_prompt.md version: v_02

This problem is about [brief description].
Key challenges: [what makes it hard]
"""

import sys
import json
import math
import networkx as nx
from pathlib import Path


def build_graph(mzn_file, json_data):
    """
    Build graph representation of the problem instance.
    
    Args:
        mzn_file: Path to .mzn file (for reference)
        json_data: Dict containing parsed DZN data
    
    Strategy: [Explain your approach]
    - What are the key entities?
    - What relationships matter for solving?
    - What makes instances hard?
    """
    # Access data directly from json_data dict
    n = json_data.get('n', 0)
    items = json_data.get('items', [])
    
    # Create graph
    G = nx.Graph()
    
    # Add nodes based on problem structure
    # Think: What are the decision points? What constrains them?
    
    # Add edges for relationships
    # Think: What conflicts exist? What depends on what?
    
    return G


def main():
    if len(sys.argv) != 4:
        print("Usage: python converter.py <mzn_file> <dzn_file> <json_file>")
        sys.exit(1)
    
    mzn_file = sys.argv[1]
    dzn_file = sys.argv[2]
    json_file = sys.argv[3]
    
    # Load JSON data
    with open(json_file, 'r') as f:
        json_data = json.load(f)
    
    # Build graph
    G = build_graph(mzn_file, json_data)
    
    # Graph is returned by build_graph for direct feature extraction
    print(f"Graph built: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")


if __name__ == "__main__":
    main()
```

## Critical Design Principles (MUST READ!)

### Graph Model Choice
**PREFER BIPARTITE GRAPHS** for constraint satisfaction problems:
- Create explicit constraint nodes (type 1) for ALL constraints
- Connect variables (type 0) to constraints they participate in
- Avoid variable-to-variable conflict edges unless absolutely necessary
- This enables better analysis of constraint structure and interactions

### Weight Design Is Critical
**Weights must be meaningful and varied:**
- Node weights should reflect importance/criticality (not all 1.0!)
- Edge weights should represent relationship strength
- Use problem-specific metrics (e.g., constraint tightness, value density)
- Normalize to [0,1] but maintain relative differences
- Consider edge cases: ensure no division by zero (use `max(denominator, 1)` or conditionals)

### Non-Linear Weighting
**Avoid simple linear scaling for all weights:**
- Don't just use `value / max_value` everywhere
- Consider how impact changes non-linearly with scale
- Use `math.exp()`, `math.log()`, `math.sqrt()` for appropriate relationships
- Examples: exponential decay for distances, logarithmic scaling for sizes
- This captures real-world relationships more accurately

## Guidelines for Creating Your Converter

### 1. Understand the Problem Domain
- What is being optimized/satisfied?
- What are the scarce resources?
- What causes conflicts?
- **What features predict instance difficulty?**

### 2. Identify Key Entities
Map problem elements to node types:
- **Type 0 (Variable-like)**: Entities making decisions
  - Examples: jobs, items, vertices, cells, customers
  - Weight by: importance, value, centrality, demand
- **Type 1 (Constraint-like)**: Rules/restrictions
  - Examples: capacity limits, precedences, conflicts, all-different
  - Weight by: tightness, scope (# variables), criticality
  - **CREATE ONE NODE PER CONSTRAINT, not one for all constraints!**
- **Type 2 (Resource-like)**: Shared/limited resources
  - Examples: machines, colors, time slots, vehicles
  - Weight by: scarcity, capacity utilization

### 3. Determine Important Relationships
Create edges that model problem structure:
- **Participation**: Variable participates in constraint (bipartite edge)
- **Dependencies**: Ordering or prerequisite relationships
- **Resource consumption**: How much resource each variable uses

#### When to Use Conflict Edges (Type 0 to Type 0)
**Use sparingly for direct incompatibilities:**
- Only when two variables directly conflict WITHOUT sharing a constraint node
- Example: Two items that cannot both fit even without explicit constraint
- Weight should reflect degree of conflict (e.g., `(demand1 + demand2) / capacity`)
- Most conflicts should be modeled via shared Type 1 constraint nodes instead

### 4. Advanced Techniques (from our best converters)
- **Constraint Tightness**: Weight = 1.0 - (capacity/total_demand)
- **Conflict Detection**: Add edges between items competing for oversubscribed resources
- **Value Density**: Highlight high-value/low-cost items
- **Non-linear Distance Weights**: Use exponential decay for geographic problems
- **Scope-based Weights**: Weight constraints by number of variables involved
- **Global Complexity Node** (optional): Consider adding a single Type 1 node representing overall problem difficulty, connected to key variables/constraints with weights reflecting their contribution to complexity

## Examples for Different Problem Types

### GOOD Example: Multi-Knapsack (Sophisticated)
```python
def build_graph(mzn_file, json_data):
    """Shows advanced techniques: tightness, conflicts, value density"""
    n = json_data.get('N', 0)  # items
    m = json_data.get('M', 0)  # constraints
    c = json_data.get('c', [])  # values
    b = json_data.get('b', [])  # capacities
    a = json_data.get('a', [])  # coefficients
    
    G = nx.Graph()
    
    # Variable nodes with value-based weights
    max_value = max(c) if c else 1
    for i in range(n):
        value = c[i] if i < len(c) else 0
        G.add_node(f'item_{i}', type=0, weight=value/max_value)
    
    # Constraint nodes with tightness-based weights
    for j in range(m):
        capacity = b[j] if j < len(b) else 1
        # Calculate tightness
        total_demand = sum(a[j]) if j < len(a) else 0
        tightness = 1.0 - (capacity/total_demand) if total_demand > capacity else 0.5
        G.add_node(f'constraint_{j}', type=1, weight=tightness)
    
    # Edges with consumption-based weights
    for j in range(m):
        for i in range(n):
            if j < len(a) and i < len(a[j]) and a[j][i] > 0:
                consumption_ratio = a[j][i] / b[j] if b[j] > 0 else 0.5
                G.add_edge(f'item_{i}', f'constraint_{j}', 
                          weight=min(consumption_ratio * 2, 1.0))
    
    # Add conflict edges for oversubscribed constraints
    for j in range(m):
        if sum(a[j]) > b[j] * 1.5:  # Oversubscribed
            items_in_constraint = [(i, a[j][i]) for i in range(n) if a[j][i] > 0]
            items_in_constraint.sort(key=lambda x: x[1], reverse=True)
            # Add conflicts between top consumers
            for idx1 in range(min(len(items_in_constraint), 5)):
                for idx2 in range(idx1+1, min(len(items_in_constraint), 5)):
                    i1, coeff1 = items_in_constraint[idx1]
                    i2, coeff2 = items_in_constraint[idx2]
                    if coeff1 + coeff2 > b[j]:  # Can't both fit
                        G.add_edge(f'item_{i1}', f'item_{i2}', 
                                  weight=(coeff1+coeff2)/(2*b[j]))
    
    return G
```

### GOOD Example: N-Queens (Proper Bipartite)
```python
def build_graph(mzn_file, json_data):
    """Pure bipartite model with all constraints explicit"""
    n = json_data.get('n', 8)
    G = nx.Graph()
    
    # Variable nodes (board positions)
    for r in range(n):
        for c in range(n):
            # Central positions are more constrained
            centrality = 1.0 - (abs(r - n//2) + abs(c - n//2)) / n
            G.add_node(f'pos_{r}_{c}', type=0, weight=centrality)
    
    # Constraint nodes for EVERY constraint
    # Row constraints (scope = n)
    for r in range(n):
        G.add_node(f'row_{r}', type=1, weight=1.0)
    # Column constraints (scope = n)  
    for c in range(n):
        G.add_node(f'col_{c}', type=1, weight=1.0)
    # Diagonal constraints (variable scope)
    for d in range(2*n-1):
        scope = n - abs(d - (n-1))
        G.add_node(f'diag_{d}', type=1, weight=scope/n)
        G.add_node(f'antidiag_{d}', type=1, weight=scope/n)
    
    # Bipartite edges: variable-constraint participation
    for r in range(n):
        for c in range(n):
            var = f'pos_{r}_{c}'
            G.add_edge(var, f'row_{r}', weight=1.0)
            G.add_edge(var, f'col_{c}', weight=1.0)
            G.add_edge(var, f'diag_{r-c+n-1}', weight=1.0)
            G.add_edge(var, f'antidiag_{r+c}', weight=1.0)
    
    return G
```

### BAD Example: Graph Coloring (Missing Constraints)
```python
def build_graph(mzn_file, json_data):
    """BAD: Only models conflicts, no explicit constraints"""
    n_vertices = json_data['n']
    edges = json_data['edges']
    
    G = nx.Graph()
    
    # Variables only
    for v in range(n_vertices):
        G.add_node(f'v_{v}', type=0, weight=0.5)  # BAD: uniform weights
    
    # Direct conflict edges (BAD: should use constraint nodes)
    for v1, v2 in edges:
        G.add_edge(f'v_{v1}', f'v_{v2}', weight=1.0)  # BAD: all same weight
    
    # MISSING: No type=1 constraint nodes!
    # Features like ConstraintClustering will fail
    
    return G
```

## Common Pitfalls to AVOID

1. **Uniform Weights**: All nodes with weight=1.0 or all edges with weight=0.5
   - Statistical features will have zero variance
   - ML model can't distinguish between easy and hard instances

2. **Missing Constraint Nodes**: Only creating variable nodes (type 0)
   - Can't analyze constraint structure and interactions
   - Loses important problem information

3. **Single Constraint Node**: One node representing ALL constraints
   - Can't model individual constraint properties
   - Loses critical problem structure

4. **Linear Weight Functions**: Using simple (distance/max_distance)
   - Use exponential or other non-linear functions for better sensitivity
   - Example: `weight = exp(-5.0 * distance / max_distance)`

5. **Ignoring Problem Scale**: Not considering how weights should change with problem size
   - An N=100 queens problem needs different weighting than N=8

## What Makes a Good Converter?

1. **Captures problem essence**: The graph structure should reflect what makes the problem hard
2. **Scales appropriately**: Large instances should have proportionally more nodes/edges
3. **Preserves difficulty indicators**: Bottlenecks and conflicts should be visible in the graph
4. **Uses all node types**: Proper use of type 0, 1, and 2 nodes
5. **Meaningful weights**: Weights that vary and reflect problem-specific metrics
6. **Rich structure**: Provides enough information for any feature extraction algorithm

## Testing Your Converter (MANDATORY)

**IMPORTANT: You MUST test your converter before submitting it!**

Test standalone (from problem directory):
```bash
# JSON files should already exist (check with: ls *.json)
# If not, create one: uv run python ../../dzn_to_json.py instance.dzn problem.mzn > instance.json

# Then test converter - THIS STEP IS MANDATORY
# Pick any instance that has a JSON file
uv run python converter.py problem.mzn instance.dzn instance.json
```

Test with validation script (from problem directory):
```bash
# The test script now looks for JSON files in the problem directory
cd ../.. # Go to project root
uv run python test_converter.py problem_filtered/YourProblem instance.dzn
```

Test with feature extraction (from project root):
```bash
uv run python extract_features_direct.py problem_filtered/YourProblem instance.dzn
```

**Your converter MUST**:
- Successfully load the JSON file
- Build a valid NetworkX graph
- Have all nodes with `type` (0, 1, or 2) and `weight` [0,1]
- Have all edges with `weight` [0,1]
- Output "Graph built: X nodes, Y edges" when run standalone

Check that:
- The converter runs without errors on at least one instance
- Harder instances have higher density or more conflicts
- Graph size scales with problem size
- Weights make semantic sense

## Version Tracking

**IMPORTANT: Include the prompt version in your converter!**

Every converter MUST include a comment specifying which version of this prompt was used:
```python
"""
Graph converter for [Problem Name] problem.
Created using subagent_prompt.md version: v_02
...
"""
```

This helps track improvements and debug issues across different converter generations.

## Feedback and Environment Issues

**IMPORTANT: Report any issues you encounter!**

If you experience any of the following, document them in `./FEEDBACK.md`:
- Missing Python imports or packages
- Unclear or contradictory instructions
- Environment setup problems
- Testing script failures that seem unrelated to your code
- JSON conversion issues
- MiniZinc model parsing problems
- Any other blockers or confusion

Format for FEEDBACK.md:
```markdown
## Converter: [Problem Name]
### Issue: [Brief description]
**Details**: [Full explanation]
**Suggested Fix**: [If you have one]
---
```

This helps improve the environment and instructions for future converter development.

## Remember

- You're not translating the constraint model literally
- You're capturing the problem's **interaction structure**
- Focus on what influences **solving difficulty**
- Keep it simple - we want patterns that generalize
- **Use the JSON data directly** - no regex parsing needed!

Your domain expertise about what makes this specific problem hard is the key value you provide!