# Supplementary Material: MaxSAT-Based Compression for Tsetlin Machines

This folder contains all experimental data and code for reproducing results.

## Structure

```
SUPPLEMENT/
├── README.md                 # This file
├── pyproject.toml           # Python dependencies
├── scripts/                  # Experiment code
│   ├── run_imli.py          # MaxSAT compression (main method)
│   ├── run_baseline.py      # TM baseline training
│   ├── run_random_pruning.py
│   ├── run_greedy_pruning.py
│   ├── run_knowledge_distillation.py
│   ├── utils.py             # Dataset loaders
│   └── aggregate_all.py     # Result aggregation
└── results/
    ├── aggregated/          # Summary statistics
    │   ├── shortlist_summary.json   # 13-dataset results (main paper)
    │   └── master_summary.json      # All experiments
    └── experiments/         # Individual experiment logs (390 files)
```

## Quick Start

```bash
# Install dependencies (requires uv: https://docs.astral.sh/uv/)
uv sync

# Run single experiment
uv run scripts/run_imli.py -d breast-cancer -p 32 -s 42 --weighted --results-dir results/

# Run baseline comparison
uv run scripts/run_baseline.py -d breast-cancer -c 100 -s 42 --results-dir results/
```

## Datasets (13 in paper)

| Dataset | Samples | Features | Source |
|---------|---------|----------|--------|
| spect-heart | 267 | 22 | UCI |
| banknote | 1,372 | 4 | UCI |
| breast-cancer | 569 | 30 | UCI |
| phishing | 11,055 | 10 | UCI |
| electricity | 45,312 | 8 | OpenML |
| tictactoe | 958 | 9 | UCI |
| spambase | 4,601 | 57 | UCI |
| magic | 19,020 | 10 | UCI |
| kr-vs-kp | 3,196 | 36 | UCI |
| higgs-100k | 100,000 | 28 | UCI (subsampled) |
| nursery | 12,960 | 8 | UCI (appendix) |
| mushroom | 8,124 | 22 | UCI (appendix) |
| car | 1,728 | 6 | UCI (appendix) |

## Experiment Results

### Main Comparison: MaxSAT vs Matched TM

| Dataset | MaxSAT Acc | Matched TM | Delta |
|---------|-----------|------------|-------|
| spect-heart | 70.7% | 44.4% | +26.3pp |
| banknote | 87.1% | 67.1% | +20.0pp |
| breast-cancer | 78.1% | 60.9% | +17.2pp |
| phishing | 76.7% | 67.1% | +9.6pp |
| electricity | 70.8% | 64.7% | +6.1pp |
| tictactoe | 82.3% | 77.1% | +5.2pp |
| spambase | 85.1% | 81.7% | +3.4pp |
| magic | 76.2% | 72.1% | +4.1pp |
| kr-vs-kp | 95.4% | 91.5% | +3.9pp |
| higgs-100k | 60.0% | 59.3% | +0.7pp |

Record: 12 wins, 1 tie out of 13 datasets.

## Result File Format

Each JSON file in `results/experiments/` contains:

```json
{
  "experiment": "imli_breast-cancer_p32_weighted_s42",
  "status": "completed",
  "results": [{
    "dataset": "breast-cancer",
    "test_acc": 0.7807,
    "compression_rate": 0.74,
    "selected_clauses": 52,
    "total_clauses": 200,
    ...
  }]
}
```

## Seeds

All experiments use 5 random seeds: 42, 123, 456, 789, 1001.

## Dependencies

- Python >= 3.10
- pyTsetlinMachine >= 0.6.6
- python-sat >= 1.8
- numpy >= 1.24.0
- scikit-learn >= 1.3.0
- ucimlrepo >= 0.0.3
