# Complete Optimization Problem and Solution: protein_institute

## 1. Problem Context and Goals

### Context  
The protein institute is focused on optimizing the distribution of research resources across various institutions. The primary objective is to enhance the total sequence identity to human proteins. This involves allocating resources in a manner that maximizes the sequence identity, which is a measure of how closely protein sequences match human proteins. The allocation decisions are represented by the amount of resources assigned to each institution, which are continuous variables.

The operational parameters are structured to support a linear optimization objective. The goal is to maximize the total sequence identity, which is calculated as the sum of the product of each institution's sequence identity coefficient and the resources allocated to it. The business configuration includes several critical parameters: the total resources available for allocation, which serves as a constraint on the total resources that can be distributed; the sequence identity coefficient, which is used as the objective coefficient; and the capacity of the buildings associated with each institution, which acts as a constraint on the maximum resources that can be allocated to each institution.

The data is presented as current operational information, focusing on precise decision-making that leads to linear formulations. Resource limitations are aligned with expected linear constraints, avoiding scenarios that require nonlinear relationships such as variable products or divisions. The business configuration parameters are referenced appropriately to ensure clarity and consistency.

### Goals  
The primary goal of this optimization problem is to maximize the total sequence identity to human proteins across all institutions. This is achieved by strategically allocating research resources to each institution. The metric to optimize is the total sequence identity, which is the sum of the sequence identity coefficients for each institution multiplied by the resources allocated to them. Success is measured by how well the allocation maximizes this metric, aligning with the expected coefficient sources. The optimization goal is described in natural language to ensure clarity and precision, avoiding mathematical formulas or symbolic notation.

## 2. Constraints    

The optimization problem is subject to several linear constraints. The first constraint ensures that the total resources allocated across all institutions do not exceed the total resources available, as defined in the business configuration. This constraint is critical to maintaining resource allocation within realistic limits. The second constraint ensures that the resources allocated to each institution do not exceed the capacity of the buildings associated with them. This constraint is essential to ensure that the allocation respects the physical limitations of each institution's infrastructure. Both constraints are described in business terms that naturally lead to linear mathematical forms, avoiding any nonlinear relationships.

## 3. Available Data  

### Database Schema  
```sql
-- Iteration 1 Database Schema
-- Objective: Schema changes include creating new tables for missing optimization data, modifying existing tables to improve mapping adequacy, and updating business configuration logic for scalar parameters and formulas.

CREATE TABLE ResourceAllocation (
  institution_id INTEGER,
  allocation FLOAT
);

CREATE TABLE protein (
  sequence_identity_coefficient FLOAT
);

CREATE TABLE building (
  building_capacity INTEGER
);
```

### Data Dictionary  
The data dictionary provides a comprehensive mapping of tables and columns to their business purposes and optimization roles:

- **ResourceAllocation Table**: Represents the allocation of resources to each institution. The `institution_id` column serves as a unique identifier for each institution, linking the allocation to specific institutions. The `allocation` column represents the amount of resources allocated to the institution, serving as the decision variable for resource allocation.

- **Protein Table**: Stores protein data, including sequence identity coefficients. The `sequence_identity_coefficient` column represents the coefficient for sequence identity to human proteins, serving as the objective coefficient in the optimization problem.

- **Building Table**: Stores building data, including capacity. The `building_capacity` column represents the capacity of the building associated with each institution, serving as the constraint bound for building capacity.

### Current Stored Values  
```sql
-- Iteration 1 Realistic Data
-- Generated by triple expert (business + data + optimization)
-- Values were determined based on typical research resource allocation scenarios, ensuring that the total resources and building capacities align with realistic institutional capabilities.

-- Realistic data for ResourceAllocation
INSERT INTO ResourceAllocation (institution_id, allocation) VALUES (1, 150.0);
INSERT INTO ResourceAllocation (institution_id, allocation) VALUES (2, 250.0);
INSERT INTO ResourceAllocation (institution_id, allocation) VALUES (3, 200.0);

-- Realistic data for protein
INSERT INTO protein (sequence_identity_coefficient) VALUES (0.85);
INSERT INTO protein (sequence_identity_coefficient) VALUES (0.9);
INSERT INTO protein (sequence_identity_coefficient) VALUES (0.8);

-- Realistic data for building
INSERT INTO building (building_capacity) VALUES (600);
INSERT INTO building (building_capacity) VALUES (700);
INSERT INTO building (building_capacity) VALUES (500);
```

## 4. Mathematical Optimization Formulation

#### Decision Variables
- Let \( x_i \) be the amount of resources allocated to institution \( i \).
  - \( x_1 \) for institution 1
  - \( x_2 \) for institution 2
  - \( x_3 \) for institution 3

#### Objective Function
Maximize the total sequence identity:
\[ \text{Maximize } 0.85x_1 + 0.9x_2 + 0.8x_3 \]

#### Constraints
1. Total Resource Constraint:
   \[ x_1 + x_2 + x_3 \leq 600 \]
   - This constraint ensures that the total resources allocated do not exceed the total available resources.

2. Building Capacity Constraints:
   \[ x_1 \leq 600 \]
   \[ x_2 \leq 700 \]
   \[ x_3 \leq 500 \]
   - These constraints ensure that the resources allocated to each institution do not exceed the capacity of their respective buildings.

Data Source Verification:
- Objective coefficients:
  - \( 0.85 \) from `protein.sequence_identity_coefficient` for institution 1
  - \( 0.9 \) from `protein.sequence_identity_coefficient` for institution 2
  - \( 0.8 \) from `protein.sequence_identity_coefficient` for institution 3
- Total resource constraint from business configuration (assumed total available resources = 600).
- Building capacity constraints:
  - \( 600 \) from `building.building_capacity` for institution 1
  - \( 700 \) from `building.building_capacity` for institution 2
  - \( 500 \) from `building.building_capacity` for institution 3

This linear model is immediately solvable using standard linear programming techniques, ensuring that the allocation of resources maximizes the total sequence identity while respecting the constraints.

## 5. Gurobipy Implementation

```python
# Complete GUROBIPY implementation

import gurobipy as gp
from gurobipy import GRB

def optimize_protein_institute():
    """Optimize resource allocation to maximize sequence identity"""
    
    # 1. MODEL & DATA SETUP
    model = gp.Model("protein_institute")
    
    # Data: Sequence identity coefficients and building capacities
    sequence_identity_coefficients = [0.85, 0.9, 0.8]
    building_capacities = [600, 700, 500]
    total_resources_available = 600
    
    # Validate array lengths
    n_institutions = len(sequence_identity_coefficients)
    assert len(building_capacities) == n_institutions, "Array length mismatch"
    
    # 2. VARIABLES
    # Decision variables for resource allocation
    x = {i: model.addVar(vtype=GRB.CONTINUOUS, name=f"x_{i+1}", lb=0) 
         for i in range(n_institutions)}
    
    # 3. OBJECTIVE FUNCTION
    # Maximize total sequence identity
    model.setObjective(gp.quicksum(sequence_identity_coefficients[i] * x[i] 
                                   for i in range(n_institutions)), GRB.MAXIMIZE)
    
    # 4. CONSTRAINTS
    # Total resource constraint
    model.addConstr(gp.quicksum(x[i] for i in range(n_institutions)) <= total_resources_available, 
                    name="total_resource_constraint")
    
    # Building capacity constraints
    for i in range(n_institutions):
        model.addConstr(x[i] <= building_capacities[i], name=f"building_capacity_{i+1}")
    
    # 5. SOLVING & RESULTS
    model.optimize()
    
    if model.status == GRB.OPTIMAL:
        print(f"Optimal value: {model.objVal}")
        for i in range(n_institutions):
            if x[i].x > 1e-6:
                print(f"x[{i+1}] = {x[i].x:.3f}")
    elif model.status == GRB.INFEASIBLE:
        print("Problem is infeasible")
    elif model.status == GRB.UNBOUNDED:
        print("Problem is unbounded")
    
    return model

# Run the optimization
optimize_protein_institute()
```

### Execution Results
**Status**: OPTIMAL
**Optimal Value**: 540.0
**Execution Time**: 0.67 seconds
**Reliability**: high
**Analysis**: Gurobipy found an optimal solution quickly with the highest reliability due to its robust performance and industry reputation.

## 6. DOCplex Implementation

```python
# Complete DOCPLEX implementation

from docplex.mp.model import Model

def protein_institute_optimization():
    """Optimize resource allocation to maximize sequence identity"""
    
    # 1. MODEL & DATA SETUP
    mdl = Model(name="protein_institute")
    
    # Data from the problem statement
    sequence_identity_coefficients = [0.85, 0.9, 0.8]
    building_capacities = [600, 700, 500]
    total_resources_available = 600
    
    # CRITICAL: Validate array lengths to prevent IndexError
    assert len(sequence_identity_coefficients) == len(building_capacities), "Array length mismatch"
    safe_range = range(min(len(sequence_identity_coefficients), len(building_capacities)))  # Safe indexing
    
    # 2. VARIABLES
    # Variable dictionaries for resource allocation
    allocation = {i: mdl.continuous_var(name=f"x_{i}", lb=0) for i in safe_range}
    
    # 3. OBJECTIVE FUNCTION
    # Maximize total sequence identity
    objective = mdl.sum(sequence_identity_coefficients[i] * allocation[i] for i in safe_range)
    mdl.maximize(objective)
    
    # 4. CONSTRAINTS - CORRECT SYNTAX PATTERNS
    
    # Total Resource Constraint
    total_allocation = mdl.sum(allocation[i] for i in safe_range)
    mdl.add_constraint(total_allocation <= total_resources_available, ctname="total_resource_limit")
    
    # Building Capacity Constraints
    for i in safe_range:
        mdl.add_constraint(allocation[i] <= building_capacities[i], ctname=f"capacity_limit_{i}")
    
    # 5. SOLVING & RESULTS
    solution = mdl.solve()
    
    if solution:
        print(f"Optimal value: {solution.objective_value}")
        for i in safe_range:
            value = solution.get_value(allocation[i])
            if value > 1e-6:
                print(f"Allocation for institution {i+1}: {value:.3f}")
    else:
        print("No solution found")
        print(f"Status: {mdl.solve_details.status}")
    
    return mdl

# Run the optimization
protein_institute_optimization()
```

### Execution Results
**Status**: OPTIMAL
**Optimal Value**: 540.0
**Execution Time**: 2.61 seconds
**Reliability**: high
**Analysis**: DOCplex also found the optimal solution, albeit with a longer execution time compared to Gurobipy, but still reliable.

## 7. Pyomo Implementation

```python
# Complete PYOMO implementation

import pyomo.environ as pyo
from pyomo.opt import SolverFactory

def protein_institute_optimization():
    """Optimize resource allocation to maximize sequence identity"""
    
    # 1. MODEL CREATION
    model = pyo.ConcreteModel()
    
    # 2. DATA SETUP
    n_institutions = 3
    sequence_identity_coefficients = [0.85, 0.9, 0.8]
    building_capacities = [600, 700, 500]
    total_resources_available = 600
    
    # CRITICAL: Validate array lengths before indexing
    assert len(sequence_identity_coefficients) == len(building_capacities) == n_institutions, "Array length mismatch"
    
    # 3. SETS
    model.I = pyo.RangeSet(1, n_institutions)  # 1-based indexing
    
    # 4. PARAMETERS
    model.sequence_identity_coefficient = pyo.Param(model.I, initialize={i+1: sequence_identity_coefficients[i] for i in range(n_institutions)})
    model.building_capacity = pyo.Param(model.I, initialize={i+1: building_capacities[i] for i in range(n_institutions)})
    
    # 5. VARIABLES
    model.x = pyo.Var(model.I, within=pyo.NonNegativeReals)
    
    # 6. OBJECTIVE FUNCTION
    def obj_rule(model):
        return sum(model.sequence_identity_coefficient[i] * model.x[i] for i in model.I)
    model.objective = pyo.Objective(rule=obj_rule, sense=pyo.maximize)
    
    # 7. CONSTRAINTS
    
    # Total Resource Constraint
    def total_resource_constraint_rule(model):
        return sum(model.x[i] for i in model.I) <= total_resources_available
    model.total_resource_constraint = pyo.Constraint(rule=total_resource_constraint_rule)
    
    # Building Capacity Constraints
    def building_capacity_rule(model, i):
        return model.x[i] <= model.building_capacity[i]
    model.building_capacity_constraint = pyo.Constraint(model.I, rule=building_capacity_rule)
    
    # 8. SOLVING WITH GUROBI
    solver = SolverFactory('gurobi')
    
    # Solve the model
    results = solver.solve(model, tee=True)
    
    # 9. RESULT PROCESSING
    # Check solver status
    if results.solver.termination_condition == pyo.TerminationCondition.optimal:
        print("Optimal solution found!")
        print(f"Optimal value: {pyo.value(model.objective):.3f}")
        
        # Extract variable values
        print("\nVariable values:")
        for i in model.I:
            x_val = pyo.value(model.x[i])
            if x_val > 1e-6:  # Only print non-zero values
                print(f"x[{i}] = {x_val:.3f}")
        
    elif results.solver.termination_condition == pyo.TerminationCondition.infeasible:
        print("Problem is infeasible")
    elif results.solver.termination_condition == pyo.TerminationCondition.unbounded:
        print("Problem is unbounded")
    else:
        print(f"Solver terminated with condition: {results.solver.termination_condition}")
    
    return model

# Run the optimization
protein_institute_optimization()
```

### Execution Results
**Status**: OPTIMAL
**Optimal Value**: 540.0
**Execution Time**: 3.08 seconds
**Reliability**: high
**Analysis**: Pyomo achieved the optimal solution with the longest execution time, but the results are consistent with other solvers.

## 8. Cross-Solver Analysis and Final Recommendation

### Solver Results Comparison

| Solver | Status | Optimal Value | Execution Time | Decision Variables | Retry Attempt |
|--------|--------|---------------|----------------|-------------------|---------------|
| Gurobipy | OPTIMAL | 540.00 | 0.67s | N/A | N/A |
| Docplex | OPTIMAL | 540.00 | 2.61s | N/A | N/A |
| Pyomo | OPTIMAL | 540.00 | 3.08s | N/A | N/A |

### Solver Consistency Analysis
**Result**: All solvers produced consistent results ✓
**Consistent Solvers**: gurobipy, docplex, pyomo
**Majority Vote Optimal Value**: 540.0

### Final Recommendation
**Recommended Optimal Value**: 540.0
**Confidence Level**: HIGH
**Preferred Solver(s)**: gurobipy
**Reasoning**: Gurobipy is preferred due to its faster execution time and high reliability, making it suitable for time-sensitive applications.

### Optimal Decision Variables
- **x_1** = 0.000
  - *Business Meaning*: Resources allocated to institution 2, which is optimal at 600.
- **x_2** = 600.000
  - *Business Meaning*: Resources allocated to institution 3, which is optimal at 0.
- **x_3** = 0.000
  - *Business Meaning*: Resource allocation for x_3

### Business Interpretation
**Overall Strategy**: Allocate all available resources to institution 2 to maximize sequence identity.
**Objective Value Meaning**: The optimal objective value of 540.0 represents the maximum achievable sequence identity given the constraints.
**Resource Allocation Summary**: All resources should be allocated to institution 2 to achieve the highest sequence identity.
**Implementation Recommendations**: Implement the solution by reallocating resources to institution 2, ensuring compliance with the building capacity and total resource constraints.