# Complete Optimization Problem and Solution: protein_institute

## 1. Problem Context and Goals

### Context  
The protein institute is focused on optimizing the distribution of research resources across various institutions. The primary objective is to enhance the total sequence identity to human proteins. This involves allocating resources in a manner that maximizes the sequence identity, which is a measure of how closely protein sequences match human proteins. The allocation decisions are represented by the amount of resources assigned to each institution, which are continuous variables.

The operational parameters are structured to support a linear optimization objective. The goal is to maximize the total sequence identity, which is calculated as the sum of the product of each institution's sequence identity coefficient and the resources allocated to it. The business configuration includes several critical parameters: the total resources available for allocation, which serves as a constraint on the total resources that can be distributed; the sequence identity coefficient, which is used as the objective coefficient; and the capacity of the buildings associated with each institution, which acts as a constraint on the maximum resources that can be allocated to each institution.

The data is presented as current operational information, focusing on precise decision-making that leads to linear formulations. Resource limitations are aligned with expected linear constraints, avoiding scenarios that require nonlinear relationships such as variable products or divisions. The business configuration parameters are referenced appropriately to ensure clarity and consistency.

### Goals  
The primary goal of this optimization problem is to maximize the total sequence identity to human proteins across all institutions. This is achieved by strategically allocating research resources to each institution. The metric to optimize is the total sequence identity, which is the sum of the sequence identity coefficients for each institution multiplied by the resources allocated to them. Success is measured by how well the allocation maximizes this metric, aligning with the expected coefficient sources. The optimization goal is described in natural language to ensure clarity and precision, avoiding mathematical formulas or symbolic notation.

## 2. Constraints    

The optimization problem is subject to several linear constraints. The first constraint ensures that the total resources allocated across all institutions do not exceed the total resources available, as defined in the business configuration. This constraint is critical to maintaining resource allocation within realistic limits. The second constraint ensures that the resources allocated to each institution do not exceed the capacity of the buildings associated with them. This constraint is essential to ensure that the allocation respects the physical limitations of each institution's infrastructure. Both constraints are described in business terms that naturally lead to linear mathematical forms, avoiding any nonlinear relationships.

## 3. Available Data  

### Database Schema  
```sql
-- Iteration 1 Database Schema
-- Objective: Schema changes include creating new tables for missing optimization data, modifying existing tables to improve mapping adequacy, and updating business configuration logic for scalar parameters and formulas.

CREATE TABLE ResourceAllocation (
  institution_id INTEGER,
  allocation FLOAT
);

CREATE TABLE protein (
  sequence_identity_coefficient FLOAT
);

CREATE TABLE building (
  building_capacity INTEGER
);
```

### Data Dictionary  
The data dictionary provides a comprehensive mapping of tables and columns to their business purposes and optimization roles:

- **ResourceAllocation Table**: Represents the allocation of resources to each institution. The `institution_id` column serves as a unique identifier for each institution, linking the allocation to specific institutions. The `allocation` column represents the amount of resources allocated to the institution, serving as the decision variable for resource allocation.

- **Protein Table**: Stores protein data, including sequence identity coefficients. The `sequence_identity_coefficient` column represents the coefficient for sequence identity to human proteins, serving as the objective coefficient in the optimization problem.

- **Building Table**: Stores building data, including capacity. The `building_capacity` column represents the capacity of the building associated with each institution, serving as the constraint bound for building capacity.

### Current Stored Values  
```sql
-- Iteration 1 Realistic Data
-- Generated by triple expert (business + data + optimization)
-- Values were determined based on typical research resource allocation scenarios, ensuring that the total resources and building capacities align with realistic institutional capabilities.

-- Realistic data for ResourceAllocation
INSERT INTO ResourceAllocation (institution_id, allocation) VALUES (1, 150.0);
INSERT INTO ResourceAllocation (institution_id, allocation) VALUES (2, 250.0);
INSERT INTO ResourceAllocation (institution_id, allocation) VALUES (3, 200.0);

-- Realistic data for protein
INSERT INTO protein (sequence_identity_coefficient) VALUES (0.85);
INSERT INTO protein (sequence_identity_coefficient) VALUES (0.9);
INSERT INTO protein (sequence_identity_coefficient) VALUES (0.8);

-- Realistic data for building
INSERT INTO building (building_capacity) VALUES (600);
INSERT INTO building (building_capacity) VALUES (700);
INSERT INTO building (building_capacity) VALUES (500);
```