# Hierarchical Kolmogorov-Arnold Networks (HKAN) for Heart Disease Prediction

This repository contains the implementation of Hierarchical Kolmogorov-Arnold Networks (HKAN) for heart disease binary classification, featuring multiple feature grouping strategies and Bayesian optimization.

## Project Structure

```
exp/
├── config.py                    # Configuration management
├── data_loader.py              # Data loading utilities
├── models.py                   # Model definitions (HKAN, Pure KAN)
├── training_utils.py           # Training and evaluation utilities
├── bayesian_ea_hkan.py         # EA best group HKAN with Bayesian optimization
├── bayesian_pure_kan.py        # Pure KAN with Bayesian optimization
├── bayesian_mi_hkan.py         # Mutual Information HKAN with Bayesian optimization
├── evolutionary_algorithm_hkan.py # Evolutionary algorithm for feature grouping
├── demo_hkan_english.py        # Demo version with 8:2 split (no validation needed)
├── comparison_kan_kfold.py     # KAN baseline comparison (5-fold CV)
├── comparison_mlp_kfold.py     # MLP baseline comparison (5-fold CV)
├── comparison_xgb_kfold.py     # XGBoost baseline comparison (5-fold CV)
├── requirements.txt            # Python dependencies
└── README.md                  # This file
```

## Installation

1. **Copy the repository**:


2. **Install dependencies**:
```bash
pip install -r requirements.txt
```

3. **Install KAN library**:
```bash
pip install pykan
```

## Dataset

The code expects a `heart_disease.csv` file with the following features:
- `age`, `sex`, `cp`, `trestbps`, `chol`, `fbs`, `restecg`
- `thalach`, `exang`, `oldpeak`, `slope`, `ca`, `thal`
- `target` (binary: 0=no disease, 1=disease)

## Usage

### Basic Usage

**Run EA-HKAN with Bayesian optimization**:
```bash
python bayesian_ea_hkan.py --data-path heart_disease.csv --results-dir results
```

**Run Pure KAN with Bayesian optimization**:
```bash
python bayesian_pure_kan.py --data-path heart_disease.csv --results-dir results
```

**Run MI-HKAN with Bayesian optimization**:
```bash
python bayesian_mi_hkan.py --data-path heart_disease.csv --results-dir results
```

**Run Demo HKAN (8:2 split, no validation needed)**:
```bash
python demo_hkan_english.py --data-path heart_disease.csv
```

**Run baseline comparisons (5-fold cross-validation)**:
```bash
# Compare KAN performance: all features vs HKAN selected features
python comparison_kan_kfold.py --data-path heart_disease.csv

# Compare MLP performance: all features vs HKAN selected features
python comparison_mlp_kfold.py --data-path heart_disease.csv

# Compare XGBoost performance: all features vs HKAN selected features
python comparison_xgb_kfold.py --data-path heart_disease.csv
```

### Advanced Configuration

You can modify hyperparameters in `config.py` or pass command-line arguments:

```bash
python bayesian_ea_hkan.py \
    --data-path /path/to/heart_disease.csv \
    --results-dir /path/to/results \
    --seed 42
```

### Evolutionary Algorithm for Feature Grouping

Run the evolutionary algorithm to discover optimal feature groups:
```bash
python evolutionary_algorithm_hkan.py --data-path heart_disease.csv
```

## Model Architecture

### HKAN (Hierarchical KAN)
```
Input Features → Feature Groups → Sub-KANs → Fusion KAN → Output
     ↓              ↓              ↓           ↓         ↓
[13 features] → [Group1: 5] → [SubKAN1] → [Fusion] → [Binary]
                [Group2: 4] → [SubKAN2] →     ↑
                [Group3: 4] → [SubKAN3] →     ↑
```

### Pure KAN
```
Input Features → KAN Network → Output
     ↓              ↓            ↓
[13 features] → [Hidden Layer] → [Binary]
```

## Results

Results are automatically saved in the specified results directory:

- `*_results.json`: Best hyperparameters and performance metrics
- `all_trials.csv`: Complete optimization history
- `final_model_results.json`: Detailed final model evaluation

## Feature Grouping Strategies

### 1. EA-Discovered Groups (EA-HKAN)
Groups discovered through evolutionary algorithm optimization:
- Group 0: ['cp', 'restecg', 'thalach', 'exang', 'ca']
- Group 1: ['fbs', 'thalach', 'oldpeak', 'ca']
- Group 2: ['age', 'trestbps', 'chol', 'slope', 'thal']
- Group 4: ['sex', 'slope', 'ca']

### 2. Mutual Information Groups (MI-HKAN)
Features grouped by mutual information with target in round-robin fashion.

### 3. Random Groups (Random-HKAN)
Baseline random grouping for comparison.

## Hyperparameter Optimization

All models use Optuna for Bayesian optimization with **100 trials** each:

**Common Parameters**:
- Learning rate: [1e-4, 1e-2] (log scale)
- KAN grid size: [3, 20]
- Regularization parameters: [1e-5, 1e-2] (log scale)

**HKAN-specific**:
- Number of groups: Fixed (EA: 4, MI: 4, Random: 3-6)
- Fusion layer size: [4, 16]
- Factor regularization: [1e-4, 1e-1] (log scale)

**Baseline Comparisons**:
- 5-fold cross-validation for statistical significance
- Compares all features vs HKAN-selected features
- Models: KAN, MLP, XGBoost

## Evaluation Metrics

- **AUC**: Area Under ROC Curve (primary metric)
- **Accuracy**: Classification accuracy
- **F1 Score**: Harmonic mean of precision and recall
- **FQS**: Factor Quality Score (for HKAN models)

## File Organization

### Main Experiment Files
All experiment files implement clean English architecture with modular design:

- **`bayesian_ea_hkan.py`**: EA-discovered feature groups with Bayesian optimization
- **`bayesian_pure_kan.py`**: Pure KAN baseline with Bayesian optimization
- **`bayesian_mi_hkan.py`**: Mutual Information feature groups with Bayesian optimization
- **`evolutionary_algorithm_hkan.py`**: Evolutionary algorithm for discovering optimal feature groups
- **`demo_hkan_english.py`**: Demo version with 8:2 train/test split (no validation needed)
- **`comparison_*_kfold.py`**: 5-fold cross-validation comparison with various baselines

### Shared Modules
- **`config.py`**: Centralized configuration management
- **`data_loader.py`**: Data loading with proper 8:1:1 train/validation/test split
- **`models.py`**: Unified model definitions (HKAN, Pure KAN)
- **`training_utils.py`**: Training and evaluation utilities

---

**Happy experimenting with HKAN! 🚀**