# ICML 2026 Supplementary Materials: Hybrid Causal Analysis (HCA) for MPC Explanations

## Overview

This package contains the complete supplementary materials for the ICML 2026 submission on **Hybrid Causal Analysis (HCA)** for explaining Model Predictive Control (MPC) decisions. The HCA framework integrates physics-based knowledge graphs, data-driven causal discovery (PCMCI), and constraint-based reasoning (KKT conditions) to provide comprehensive explanations of MPC controller behavior.

## Package Structure

```
icml_supplementary_2026_final/
├── README.md                          # This file
├── tep_usecase/                       # Tennessee Eastman Process use case
│   ├── data/                          # Complete TEP simulation data
│   └── code/                          # TEP-specific implementation
├── electricity_usecase/               # Energy management use case
│   ├── data/                          # Electricity pricing and demand data
│   └── code/                          # Energy MPC-HCA implementation
├── greenhouse_usecase/                # Greenhouse climate control use case
│   ├── data/                          # Greenhouse sensor data (5+ years)
│   ├── code/                          # Main Greenhouse bot implementation
│   └── mpc_simulation/                # MPC simulation components
├── ablation_results/                  # Cross-domain ablation study results
├── analysis_code/                     # Statistical analysis and testing
└── evaluation_scripts/                # Evaluation pipelines for all domains
```

## Use Cases

### 1. Tennessee Eastman Process (TEP)

**Domain**: Industrial process control  
**System**: 8-state chemical process with 4 control inputs  
**Objective**: Maintain product composition while minimizing energy consumption

**Data** (`tep_usecase/data/`):
- NMPC simulation data with state trajectories
- Dual variable traces (Lagrange multipliers)
- Disturbance scenarios and fault conditions

**Code** (`tep_usecase/code/`):
- `run_nmpc_generate_duals.py`: NMPC simulation with KKT dual extraction
- `run_tep_research_ablation.py`: Ablation study implementation
- `research_questions_tep.json`: Research questions for evaluation

**Key Features**:
- Complex multi-variable interactions
- Constraint-driven control (purity specifications)
- Process dynamics with multiple time scales

---

### 2. Electricity Use Case

**Domain**: Building energy management  
**System**: Battery storage with solar generation and dynamic pricing  
**Objective**: Minimize electricity costs while maintaining comfort

**Data** (`electricity_usecase/data/`):
- Real electricity pricing data (5-minute resolution)
- Solar generation profiles
- Building load patterns
- Battery state-of-charge traces

**Code** (`electricity_usecase/code/`):
- `nmpc_energy_hca_final.py`: Energy MPC with HCA explanations
- `nmpc_simulation_hca.py`: Simulation framework with dual variable extraction

**Key Features**:
- Economic optimization (time-of-use pricing)
- Renewable energy integration
- Storage constraints and battery degradation

---

### 3. Greenhouse Climate Control (Main Use Case)

**Domain**: Agriculture - protected cultivation  
**System**: 4-state greenhouse model (Temperature, Humidity, CO2, Biomass)  
**Objective**: Optimal crop growth while minimizing energy consumption

**Data** (`greenhouse_usecase/data/`):
- `filtered_dates.csv`: 5+ years of 5-minute greenhouse sensor data
  - States: Temperature, Relative Humidity, CO2 concentration, Biomass
  - Controls: Ventilation, Heating, Cooling, CO2 injection
  - Disturbances: Outdoor temperature, Solar radiation

**Code** (`greenhouse_usecase/code/`):
- `Greenhouse_bot_v34.py`: Main conversational AI system with HCA
  - Knowledge graph construction
  - PCMCI causal discovery integration
  - KKT constraint analysis
  - Natural language explanation generation

**MPC Simulation** (`greenhouse_usecase/mpc_simulation/`):
- `parameters.py`: Greenhouse model parameters and control specifications
- `plantODE_cpl.py`: Coupled plant ODE system (state evolution)
- `plantRef_llm.py`: Reference trajectory generation for LLM integration

**Key Features**:
- Conversational AI interface (GPT-based)
- Multi-year historical data analysis
- Complex environmental interactions
- Real-world agricultural constraints

---

## Ablation Studies

**Location**: `ablation_results/`

### Files:
1. **`tep_ablation_results.csv`**: TEP-specific ablation study
   - Compares HCA vs. individual components
   - Metrics: Accuracy, Completeness, Coherence

2. **`hca_ablations_all_domains.csv`**: Cross-domain comparison
   - All three use cases (TEP, Electricity, Greenhouse)
   - Component ablations:
     - Full HCA (KG + PCMCI + KKT)
     - KG + KKT only
     - PCMCI + KKT only
     - KG + PCMCI only
     - Individual components

### Key Findings:
- **HCA outperforms all baselines** across domains (accuracy: 0.85-0.92)
- Knowledge graph provides structural reasoning
- PCMCI captures data-driven dynamics
- KKT conditions explain constraint activation
- Synergistic combination superior to individual components

---

## Analysis Code

**Location**: `analysis_code/`

### Statistical Analysis:
1. **`statistical_significance_tests.py`**: Paired t-tests, Wilcoxon signed-rank tests
2. **`statistical_tests_new_baselines.py`**: Extended baseline comparisons
3. **`pcmci_multiple_testing_correction.py`**: Bonferroni/FDR corrections for causal discovery

### Sensitivity Analysis:
4. **`kkt_threshold_sensitivity_analysis.py`**: KKT dual threshold optimization
5. **`kkt_threshold_cross_domain.py`**: Cross-domain threshold validation

### Baseline Comparisons:
6. **`mpc_xai_baselines.py`**: Implementation of:
   - LIME (Local Interpretable Model-agnostic Explanations)
   - SHAP (SHapley Additive exPlanations)
   - Attention mechanisms
   - Template-based explanations

---

## Evaluation Scripts

**Location**: `evaluation_scripts/`

### Domain-Specific Evaluation:
1. **`evaluate_tep_complete.py`**: TEP evaluation pipeline
2. **`evaluate_electricity_complete.py`**: Electricity use case evaluation
3. **`evaluate_greenhouse_complete.py`**: Greenhouse evaluation with conversational sessions

### Baseline Evaluation:
4. **`evaluate_neural_baselines_greenhouse.py`**: Neural baseline comparisons (LSTM, Transformer)
5. **`evaluate_improved_hca.py`**: Enhanced HCA with advanced reasoning

### Metrics:
- **Accuracy**: Correctness of variable identification
- **Completeness**: Coverage of relevant causal factors
- **Coherence**: Logical consistency of explanations
- **RAGAS Metrics**: Answer relevance, context precision/recall, faithfulness

---

## Hybrid Causal Analysis (HCA) Framework

### Components:

1. **Physics-Based Knowledge Graph (KG)**
   - Nodes: States, controls, disturbances, fluxes
   - Edges: Physical relationships (mass/energy balance)
   - Provides structural causal constraints

2. **Data-Driven Causal Discovery (PCMCI)**
   - Peter-Clark Momentary Conditional Independence algorithm
   - Discovers time-lagged causal relationships
   - Handles confounders and latent variables
   - Multiple testing correction (Bonferroni/FDR)

3. **Constraint-Based Reasoning (KKT)**
   - Analyzes Lagrange multipliers (dual variables)
   - Identifies active constraints
   - Explains why controls saturate or remain inactive
   - Threshold-based activation detection

### Integration:
```
HCA = KG ⊕ PCMCI ⊕ KKT
```
- KG provides structure → guides PCMCI variable selection
- PCMCI validates KG → confirms/refutes hypothesized links
- KKT explains constraints → contextualizes control decisions
- Combined system generates comprehensive causal stories

---

## Usage Instructions

### Prerequisites:
```bash
# Python 3.8+
pip install pandas numpy matplotlib seaborn
pip install networkx plotly gradio
pip install tigramite scikit-learn scipy
pip install openai spacy dateparser
pip install casadi cvxpy  # For MPC simulation
```

### Running Evaluations:

**Greenhouse Use Case:**
```bash
cd evaluation_scripts/
python evaluate_greenhouse_complete.py
```

**TEP Use Case:**
```bash
cd evaluation_scripts/
python evaluate_tep_complete.py
```

**Electricity Use Case:**
```bash
cd evaluation_scripts/
python evaluate_electricity_complete.py
```

### Reproducing Ablation Studies:

**Single Domain (TEP):**
```bash
cd tep_usecase/code/
python run_tep_research_ablation.py
```

**Cross-Domain Comparison:**
```bash
cd analysis_code/
python statistical_significance_tests.py
```

### Generating Figures:

**Publication-Quality Plots:**
```python
from greenhouse_usecase.code.Greenhouse_bot_v34 import EnhancedGreenhouseKnowledgeGraph
import matplotlib.pyplot as plt

# Load data
qa_system = EnhancedGreenhouseKnowledgeGraph(
    kg=base_kg, 
    data_path='greenhouse_usecase/data/filtered_dates.csv'
)

# Generate causal story visualization
qa_system.create_causal_story_visualization(scenario_config)
```

---

## Key Results Summary

### Cross-Domain Performance (Average Metrics):

| Method | Accuracy | Completeness | Coherence |
|--------|----------|--------------|-----------|
| **HCA (Full)** | **0.89** | **0.87** | **0.91** |
| KG + KKT | 0.78 | 0.75 | 0.82 |
| PCMCI + KKT | 0.72 | 0.68 | 0.76 |
| KG + PCMCI | 0.69 | 0.71 | 0.74 |
| KG only | 0.61 | 0.58 | 0.68 |
| PCMCI only | 0.54 | 0.52 | 0.59 |
| LIME | 0.48 | 0.45 | 0.51 |
| SHAP | 0.46 | 0.43 | 0.49 |

### Statistical Significance:
- HCA vs. best baseline: **p < 0.001** (Wilcoxon signed-rank)
- Effect size (Cohen's d): **1.84** (large effect)

### Domain-Specific Highlights:

**TEP**:
- Explains complex constraint interactions (purity specs)
- Identifies disturbance-control relationships
- Accuracy: 0.92

**Electricity**:
- Captures time-of-use pricing effects
- Explains battery charging/discharging decisions
- Economic reasoning integration
- Accuracy: 0.87

**Greenhouse**:
- Multi-year conversational analysis
- Real-world agricultural domain
- Handles missing data and sensor noise
- Accuracy: 0.88

---

## Data Formats

### Time-Series Data (CSV):
```csv
timestamp,state1,state2,...,control1,control2,...,disturbance1,...
2011-01-01 00:00:00,20.5,65.2,...,0.3,0.0,...,15.2,...
```

### Dual Variables (NMPC Output):
```csv
timestamp,lambda_T,lambda_H,...,mu_uV_min,mu_uV_max,...
2011-01-01 00:00:00,1.2e-6,0.0,...,0.0,3.4e-5,...
```

### Research Questions (JSON):
```json
{
  "questions": [
    {
      "id": "Q1",
      "question": "Why did heating activate at 2011-01-15 08:00?",
      "timestamp": "2011-01-15 08:00:00",
      "expected_variables": ["T", "uQh", "Tout", "Qrad"],
      "causal_pathway": ["Tout", "T", "uQh"]
    }
  ]
}
```

---

## Citation

If you use this code or data, please cite:

```bibtex
@inproceedings{hca_mpc_icml2026,
  title={Hybrid Causal Analysis for Explainable Model Predictive Control},
  author={Anonymous},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2026}
}
```

**Note**: This is an anonymous submission. Full citation details will be provided upon acceptance.

---

## License

[Specify license - e.g., MIT, Apache 2.0]

---

## Reproducibility Checklist

✅ Complete source code for all three use cases  
✅ Full datasets (or access instructions)  
✅ Ablation study implementations  
✅ Evaluation scripts with ground truth  
✅ Statistical analysis code  
✅ Baseline comparison implementations  
✅ Figure generation scripts  
✅ Environment specifications (requirements.txt)  
✅ Random seed control for reproducibility  
✅ Documentation and usage examples  

---

## Version History

- **v1.0 (January 2026)**: Initial release for ICML submission
  - Three complete use cases
  - Comprehensive ablation studies
  - Cross-domain validation
  - Publication-ready figures and analysis

---

**Package Generated**: January 27, 2026  
**ICML Submission ID**: [To be assigned]
