# Complete Optimization Problem and Solution: music_1

## 1. Problem Context and Goals

### Context  
A music streaming platform is focused on optimizing its storage and bandwidth usage by strategically selecting a subset of songs to store locally on its servers. The platform must decide which songs to store locally, represented by binary decisions for each song. The primary goal is to minimize the total file size of the songs stored locally, ensuring efficient use of storage resources.  

The platform has established several operational parameters to maintain a diverse and high-quality music library. These include:  
- A minimum number of songs that must be stored locally to ensure a substantial library for users.  
- A minimum average rating for the stored songs to maintain high-quality content and enhance user satisfaction.  
- A maximum number of songs per artist to prevent overrepresentation of any single artist and promote diversity.  
- A minimum number of songs per genre to ensure a well-rounded music library across different genres.  

These parameters are defined in the business configuration and serve as the foundation for the optimization constraints. The decision-making process is linear, focusing solely on whether each song is stored locally, without involving complex relationships such as variable products or divisions.  

### Goals  
The optimization goal is to minimize the total file size of the songs stored locally. This is achieved by selecting a subset of songs that meets all operational constraints while keeping the combined file size as small as possible. Success is measured by the efficient use of storage resources, ensuring that the platform maintains a diverse and high-quality library without exceeding storage limitations.  

## 2. Constraints  

The optimization problem is subject to the following constraints, which ensure the platform meets its operational and quality requirements:  
1. **Minimum Total Songs Stored**: The total number of songs stored locally must meet or exceed a specified minimum. This ensures a substantial library for users.  
2. **Minimum Average Rating**: The average rating of the songs stored locally must meet or exceed a specified threshold. This maintains high-quality content and enhances user satisfaction.  
3. **Maximum Songs per Artist**: The number of songs stored locally for any single artist must not exceed a specified limit. This prevents overrepresentation of any artist and promotes diversity.  
4. **Minimum Songs per Genre**: The number of songs stored locally for each genre must meet or exceed a specified minimum. This ensures a diverse music library across different genres.  

These constraints are designed to align with linear mathematical forms, ensuring the optimization problem remains straightforward and computationally efficient.  

## 3. Available Data  

### Database Schema  
```sql
-- Iteration 1 Database Schema
-- Objective: Schema changes include creating new tables for decision variables and constraints, moving scalar parameters to business_configuration_logic.json, and updating the data dictionary to reflect optimization mappings.

CREATE TABLE files (
  file_size INTEGER,
  is_stored_locally BOOLEAN
);

CREATE TABLE decision_variables (
  song_id INTEGER,
  is_stored_locally BOOLEAN
);

CREATE TABLE constraints (
  constraint_type STRING,
  constraint_value INTEGER
);
```

### Data Dictionary  
The data dictionary provides a clear mapping of tables and columns to their business purposes and optimization roles:  
- **files**: Stores metadata about songs, including file size and storage decisions.  
  - `file_size`: Represents the file size of the song in MB. This is used as a coefficient in the objective function to minimize total storage.  
  - `is_stored_locally`: Indicates whether the song is stored locally. This serves as the binary decision variable in the optimization problem.  
- **decision_variables**: Contains binary decision variables for song storage.  
  - `song_id`: Unique identifier for each song, used to index the decision variables.  
  - `is_stored_locally`: Indicates whether the song is stored locally, aligning with the decision variable in the optimization problem.  
- **constraints**: Defines the constraints for the optimization problem.  
  - `constraint_type`: Specifies the type of constraint (e.g., genre, artist).  
  - `constraint_value`: Represents the value of the constraint (e.g., minimum songs per genre), serving as the bound for the constraint in the optimization problem.  

### Current Stored Values  
```sql
-- Iteration 1 Realistic Data
-- Generated by triple expert (business + data + optimization)
-- Values were determined based on realistic scenarios for a music streaming platform, considering typical file sizes, song ratings, and genre/artist distributions. Parameters were set to ensure a diverse and high-quality library while optimizing storage.

-- Realistic data for files
INSERT INTO files (file_size, is_stored_locally) VALUES (8, False);
INSERT INTO files (file_size, is_stored_locally) VALUES (12, True);
INSERT INTO files (file_size, is_stored_locally) VALUES (6, False);

-- Realistic data for decision_variables
INSERT INTO decision_variables (song_id, is_stored_locally) VALUES (1, False);
INSERT INTO decision_variables (song_id, is_stored_locally) VALUES (2, True);
INSERT INTO decision_variables (song_id, is_stored_locally) VALUES (3, False);

-- Realistic data for constraints
INSERT INTO constraints (constraint_type, constraint_value) VALUES ('genre', 10);
INSERT INTO constraints (constraint_type, constraint_value) VALUES ('artist', 5);
INSERT INTO constraints (constraint_type, constraint_value) VALUES ('rating', 4);
```

## 4. Mathematical Optimization Formulation

#### Decision Variables
- Let \( x_i \) be a binary decision variable where:  
  \( x_i = 1 \) if song \( i \) is stored locally,  
  \( x_i = 0 \) otherwise.  
  (Source: `decision_variables.is_stored_locally`)

#### Objective Function
Minimize the total file size of songs stored locally:  
\[
\text{Minimize } \sum_{i} \text{file_size}_i \cdot x_i  
\]  
(Source: `files.file_size`)

#### Constraints
1. **Minimum Total Songs Stored**:  
   \[
   \sum_{i} x_i \geq \text{min_total_songs}  
   \]  
   (Source: `constraints.constraint_value` where `constraint_type = 'total_songs'`)

2. **Minimum Average Rating**:  
   \[
   \frac{\sum_{i} \text{rating}_i \cdot x_i}{\sum_{i} x_i} \geq \text{min_avg_rating}  
   \]  
   To linearize, rewrite as:  
   \[
   \sum_{i} (\text{rating}_i - \text{min_avg_rating}) \cdot x_i \geq 0  
   \]  
   (Source: `constraints.constraint_value` where `constraint_type = 'rating'`)

3. **Maximum Songs per Artist**:  
   For each artist \( a \):  
   \[
   \sum_{i \in \text{songs by artist } a} x_i \leq \text{max_songs_per_artist}  
   \]  
   (Source: `constraints.constraint_value` where `constraint_type = 'artist'`)

4. **Minimum Songs per Genre**:  
   For each genre \( g \):  
   \[
   \sum_{i \in \text{songs in genre } g} x_i \geq \text{min_songs_per_genre}  
   \]  
   (Source: `constraints.constraint_value` where `constraint_type = 'genre'`)

#### Data Source Verification
- `files.file_size`: Coefficient for the objective function.  
- `decision_variables.is_stored_locally`: Binary decision variable \( x_i \).  
- `constraints.constraint_value`:  
  - `min_total_songs`: From `constraint_type = 'total_songs'`.  
  - `min_avg_rating`: From `constraint_type = 'rating'`.  
  - `max_songs_per_artist`: From `constraint_type = 'artist'`.  
  - `min_songs_per_genre`: From `constraint_type = 'genre'`.  

This formulation is a complete, immediately solvable LINEAR mathematical model with all numerical coefficients derived from the provided data.

## 5. Gurobipy Implementation

```python
#!/usr/bin/env python3
"""
Gurobipy 12.0.2 Implementation for Music Storage Optimization
"""

import gurobipy as gp
from gurobipy import GRB

def music_storage_optimization():
    """Optimize music storage to minimize total file size while meeting constraints."""
    
    # 1. MODEL & DATA SETUP
    model = gp.Model("music_storage_optimization")
    
    # Example data (replace with actual data from database)
    songs = [
        {'id': 1, 'file_size': 8, 'rating': 4, 'artist': 'ArtistA', 'genre': 'Pop'},
        {'id': 2, 'file_size': 12, 'rating': 5, 'artist': 'ArtistB', 'genre': 'Rock'},
        {'id': 3, 'file_size': 6, 'rating': 3, 'artist': 'ArtistA', 'genre': 'Pop'}
    ]
    
    constraints = {
        'min_total_songs': 2,
        'min_avg_rating': 4,
        'max_songs_per_artist': 1,
        'min_songs_per_genre': 1
    }
    
    # Extract unique artists and genres
    artists = list(set(song['artist'] for song in songs))
    genres = list(set(song['genre'] for song in songs))
    
    # CRITICAL: Validate array lengths before loops
    assert len(songs) > 0, "No songs provided"
    assert len(artists) > 0, "No artists provided"
    assert len(genres) > 0, "No genres provided"
    
    # 2. VARIABLES
    x = {song['id']: model.addVar(vtype=GRB.BINARY, name=f"x_{song['id']}") for song in songs}
    
    # 3. OBJECTIVE FUNCTION
    model.setObjective(gp.quicksum(song['file_size'] * x[song['id']] for song in songs), GRB.MINIMIZE)
    
    # 4. CONSTRAINTS
    
    # Minimum Total Songs Stored
    model.addConstr(gp.quicksum(x[song['id']] for song in songs) >= constraints['min_total_songs'], name="min_total_songs")
    
    # Minimum Average Rating (Linearized)
    model.addConstr(gp.quicksum((song['rating'] - constraints['min_avg_rating']) * x[song['id']] for song in songs) >= 0, name="min_avg_rating")
    
    # Maximum Songs per Artist
    for artist in artists:
        artist_songs = [song['id'] for song in songs if song['artist'] == artist]
        model.addConstr(gp.quicksum(x[song_id] for song_id in artist_songs) <= constraints['max_songs_per_artist'], name=f"max_songs_artist_{artist}")
    
    # Minimum Songs per Genre
    for genre in genres:
        genre_songs = [song['id'] for song in songs if song['genre'] == genre]
        model.addConstr(gp.quicksum(x[song_id] for song_id in genre_songs) >= constraints['min_songs_per_genre'], name=f"min_songs_genre_{genre}")
    
    # 5. SOLVING & RESULTS
    model.optimize()
    
    if model.status == GRB.OPTIMAL:
        print(f"Optimal value: {model.objVal}")
        for song in songs:
            if x[song['id']].x > 1e-6:
                print(f"Song {song['id']} is stored locally.")
    elif model.status == GRB.INFEASIBLE:
        print("Problem is infeasible")
    elif model.status == GRB.UNBOUNDED:
        print("Problem is unbounded")
    
    return model

# Run the optimization
if __name__ == "__main__":
    music_storage_optimization()
```

### Execution Results
**Status**: OPTIMAL
**Optimal Value**: 18.0
**Execution Time**: 0.16 seconds
**Reliability**: high
**Analysis**: Gurobipy found an optimal solution with the lowest execution time, indicating high efficiency and reliability.

## 6. DOCplex Implementation

```python
#!/usr/bin/env python3
"""
DOCPLEX implementation for music storage optimization problem
"""

from docplex.mp.model import Model

def music_storage_optimization():
    """Optimize music storage to minimize total file size while meeting constraints"""
    
    # 1. MODEL & DATA SETUP
    mdl = Model(name="music_storage_optimization")
    
    # Example data (replace with actual data loading)
    file_sizes = [8, 12, 6]  # File sizes in MB
    ratings = [4, 5, 3]      # Song ratings
    artists = [1, 2, 1]      # Artist IDs
    genres = [1, 2, 1]       # Genre IDs
    
    # Constraints
    min_total_songs = 2
    min_avg_rating = 4
    max_songs_per_artist = 1
    min_songs_per_genre = 1
    
    # CRITICAL: Validate array lengths to prevent IndexError
    assert len(file_sizes) == len(ratings) == len(artists) == len(genres), "Array length mismatch"
    safe_range = range(min(len(file_sizes), len(ratings), len(artists), len(genres)))  # Safe indexing
    
    # 2. VARIABLES
    x = {i: mdl.binary_var(name=f"x_{i}") for i in safe_range}  # Decision variables
    
    # 3. OBJECTIVE FUNCTION
    total_file_size = mdl.sum(file_sizes[i] * x[i] for i in safe_range)
    mdl.minimize(total_file_size)
    
    # 4. CONSTRAINTS
    
    # Minimum Total Songs Stored
    total_songs = mdl.sum(x[i] for i in safe_range)
    mdl.add_constraint(total_songs >= min_total_songs, ctname="min_total_songs")
    
    # Minimum Average Rating (linearized)
    total_rating = mdl.sum((ratings[i] - min_avg_rating) * x[i] for i in safe_range)
    mdl.add_constraint(total_rating >= 0, ctname="min_avg_rating")
    
    # Maximum Songs per Artist
    for artist in set(artists):
        artist_songs = mdl.sum(x[i] for i in safe_range if artists[i] == artist)
        mdl.add_constraint(artist_songs <= max_songs_per_artist, ctname=f"max_songs_artist_{artist}")
    
    # Minimum Songs per Genre
    for genre in set(genres):
        genre_songs = mdl.sum(x[i] for i in safe_range if genres[i] == genre)
        mdl.add_constraint(genre_songs >= min_songs_per_genre, ctname=f"min_songs_genre_{genre}")
    
    # 5. SOLVING & RESULTS
    solution = mdl.solve()
    
    if solution:
        print(f"Optimal value: {solution.objective_value}")
        for i in safe_range:
            value = solution.get_value(x[i])
            if value > 1e-6:
                print(f"Song {i} is stored locally: {value}")
    else:
        print("No solution found")
        print(f"Status: {mdl.solve_details.status}")
    
    return mdl

# Run the optimization
if __name__ == "__main__":
    music_storage_optimization()
```

### Execution Results
**Status**: OPTIMAL
**Optimal Value**: 18.0
**Execution Time**: 1.12 seconds
**Reliability**: medium
**Analysis**: DOCplex also found an optimal solution but took significantly longer than Gurobipy, suggesting lower efficiency.

## 7. Pyomo Implementation

```python
# Complete PYOMO implementation

import pyomo.environ as pyo
from pyomo.opt import SolverFactory

def music_storage_optimization():
    """Optimize music storage for a streaming platform."""
    
    # 1. MODEL CREATION
    model = pyo.ConcreteModel()
    
    # 2. DATA SETUP
    # Example data (replace with actual data from database)
    song_ids = [1, 2, 3]
    file_sizes = {1: 8, 2: 12, 3: 6}
    ratings = {1: 4, 2: 5, 3: 3}
    artists = {1: 'A', 2: 'B', 3: 'A'}
    genres = {1: 'Pop', 2: 'Rock', 3: 'Pop'}
    
    # Constraints
    min_total_songs = 2
    min_avg_rating = 4
    max_songs_per_artist = 1
    min_songs_per_genre = 1
    
    # Validate array lengths
    assert len(song_ids) == len(file_sizes) == len(ratings) == len(artists) == len(genres), "Array length mismatch"
    
    # 3. SETS
    model.I = pyo.Set(initialize=song_ids)  # Set of songs
    
    # 4. PARAMETERS
    model.file_size = pyo.Param(model.I, initialize=file_sizes)
    model.rating = pyo.Param(model.I, initialize=ratings)
    model.artist = pyo.Param(model.I, initialize=artists)
    model.genre = pyo.Param(model.I, initialize=genres)
    
    # 5. VARIABLES
    model.x = pyo.Var(model.I, within=pyo.Binary)  # Binary decision variable for song storage
    
    # 6. OBJECTIVE FUNCTION
    def obj_rule(model):
        return sum(model.file_size[i] * model.x[i] for i in model.I)
    model.objective = pyo.Objective(rule=obj_rule, sense=pyo.minimize)
    
    # 7. CONSTRAINTS
    
    # Minimum Total Songs Stored
    def min_total_songs_rule(model):
        return sum(model.x[i] for i in model.I) >= min_total_songs
    model.min_total_songs_constraint = pyo.Constraint(rule=min_total_songs_rule)
    
    # Minimum Average Rating
    def min_avg_rating_rule(model):
        return sum((model.rating[i] - min_avg_rating) * model.x[i] for i in model.I) >= 0
    model.min_avg_rating_constraint = pyo.Constraint(rule=min_avg_rating_rule)
    
    # Maximum Songs per Artist
    def max_songs_per_artist_rule(model, a):
        return sum(model.x[i] for i in model.I if model.artist[i] == a) <= max_songs_per_artist
    model.max_songs_per_artist_constraint = pyo.Constraint(['A', 'B'], rule=max_songs_per_artist_rule)
    
    # Minimum Songs per Genre
    def min_songs_per_genre_rule(model, g):
        return sum(model.x[i] for i in model.I if model.genre[i] == g) >= min_songs_per_genre
    model.min_songs_per_genre_constraint = pyo.Constraint(['Pop', 'Rock'], rule=min_songs_per_genre_rule)
    
    # 8. SOLVING WITH GUROBI
    solver = SolverFactory('gurobi')
    
    # Optional: Set solver options
    solver.options['TimeLimit'] = 300  # 5 minutes
    solver.options['MIPGap'] = 0.01    # 1% gap
    
    # Solve the model
    results = solver.solve(model, tee=True)  # tee=True shows solver output
    
    # 9. RESULT PROCESSING
    # Check solver status
    if results.solver.termination_condition == pyo.TerminationCondition.optimal:
        print("Optimal solution found!")
        print(f"Optimal value: {pyo.value(model.objective)}")
        
        # Extract variable values
        print("\nVariable values:")
        for i in model.I:
            x_val = pyo.value(model.x[i])
            if x_val > 1e-6:  # Only print non-zero values
                print(f"x[{i}] = {int(x_val)}")
        
    elif results.solver.termination_condition == pyo.TerminationCondition.infeasible:
        print("Problem is infeasible")
    elif results.solver.termination_condition == pyo.TerminationCondition.unbounded:
        print("Problem is unbounded")
    else:
        print(f"Solver terminated with condition: {results.solver.termination_condition}")
    
    return model

# Run the optimization
if __name__ == "__main__":
    music_storage_optimization()
```

### Execution Results
**Status**: OPTIMAL
**Optimal Value**: 18.0
**Execution Time**: 1.13 seconds
**Reliability**: medium
**Analysis**: Pyomo found an optimal solution but had the longest execution time, indicating lower efficiency compared to Gurobipy.

## 8. Cross-Solver Analysis and Final Recommendation

### Solver Results Comparison

| Solver | Status | Optimal Value | Execution Time | Decision Variables | Retry Attempt |
|--------|--------|---------------|----------------|-------------------|---------------|
| Gurobipy | OPTIMAL | 18.00 | 0.16s | N/A | N/A |
| Docplex | OPTIMAL | 18.00 | 1.12s | N/A | N/A |
| Pyomo | OPTIMAL | 18.00 | 1.13s | N/A | N/A |

### Solver Consistency Analysis
**Result**: All solvers produced consistent results ✓
**Consistent Solvers**: gurobipy, docplex, pyomo
**Majority Vote Optimal Value**: 18.0

### Final Recommendation
**Recommended Optimal Value**: 18.0
**Confidence Level**: HIGH
**Preferred Solver(s)**: gurobipy
**Reasoning**: Gurobipy is recommended due to its high reliability and significantly faster execution time compared to DOCplex and Pyomo.

### Business Interpretation
**Overall Strategy**: The optimal solution minimizes the total file size of songs stored locally while satisfying all constraints, ensuring efficient use of storage resources.
**Objective Value Meaning**: The optimal objective value of 18.0 represents the minimized total file size of songs stored locally, ensuring efficient storage management.
**Resource Allocation Summary**: Resources should be allocated to store songs that meet the minimum total songs, minimum average rating, maximum songs per artist, and minimum songs per genre constraints.
**Implementation Recommendations**: Implement the solution by storing the selected songs locally, ensuring all constraints are satisfied, and periodically reviewing the storage strategy for optimization.