# Human-Prior Correction (HPC) - Complete Implementation

This repository contains the complete implementation of the Human-Prior Correction method described in the ICLR 2026 paper "Human-Prior Correction: Scalable Post-hoc Calibration that Aligns Vision Models with Human Uncertainty".

## 🚀 Quick Start

```bash
# Basic evaluation on CIFAR-10H
python evaluate_hpc.py --dataset cifar10h --batch_size 128

# Evaluation on CIFAR-100 with custom model
python evaluate_hpc.py --dataset cifar100 --model_path ./your_model.pth

# Run complete experiment suite
python experiment_configs.py suite --name main_experiments

# Run ablation study over alpha values
python experiment_configs.py ablation --type alpha --dataset cifar10h
```

## 📁 Repository Structure

```
HPC/
├── hpc_core.py                 # Core HPC algorithm implementation
├── proxy_priors.py             # Proxy prior construction (CLIP, DINO, SimCLR)
├── cifar10h_utils.py           # CIFAR-10H human confusion matrix utilities
├── evaluation_metrics.py       # Comprehensive calibration metrics
├── baseline_calibration.py     # Baseline calibration methods
├── data_loaders.py             # Data loading for CIFAR-10/100, ImageNet
├── adaptive_gating.py          # Enhanced adaptive alpha mechanisms
├── conformal_prediction.py     # Conformal prediction integration
├── evaluate_hpc.py             # Main evaluation script
├── experiment_configs.py       # Experiment management and configs
├── requirements.txt            # Python dependencies
└── README.md                   # This file
```

## 🔧 Installation

```bash
# Clone or download this directory
cd HPC

# Install dependencies
pip install -r requirements.txt

# Optional: Install CLIP for proxy priors
pip install git+https://github.com/openai/CLIP.git
```

## 📊 Core Components

### 1. HPC Core Algorithm (`hpc_core.py`)
- Main Human-Prior Correction implementation
- Temperature scaling integration
- Basic adaptive alpha mechanism
- Bayesian combination: `p' = (1-α)p + α(C·p)`

### 2. Proxy Prior Construction (`proxy_priors.py`)
- **CLIP-based priors**: Text and image embedding similarities
- **Self-supervised priors**: DINO/SimCLR feature-based confusion matrices
- **Few-shot human priors**: Combine limited human data with proxy priors
- Ready-to-use CIFAR-10/100 configurations

### 3. Enhanced Adaptive Mechanisms (`adaptive_gating.py`)
- **Enhanced Adaptive Alpha**: Multi-layer networks with attention
- **Uncertainty-Aware Gating**: Predictive uncertainty estimation
- **Multi-Scale Gating**: Local and global context integration
- **Hierarchical Gating**: Class-level and semantic group awareness

### 4. Evaluation Metrics (`evaluation_metrics.py`)
- **Expected Calibration Error (ECE)**: Standard and human-targeted versions
- **Negative Log-Likelihood**: Both true labels (NLL_true) and human distributions (NLL_human)
- **Reliability Diagrams**: Traditional and human-centric calibration plots
- **Decision Utility**: AURC, coverage at risk thresholds
- **Robustness Analysis**: Performance under distribution shift

### 5. Baseline Methods (`baseline_calibration.py`)
- Temperature Scaling, Vector Scaling, Matrix Scaling
- Dirichlet Calibration, Histogram Binning
- Isotonic Regression
- Ensemble methods

### 6. Data Handling (`data_loaders.py`)
- CIFAR-10H with human annotation support
- CIFAR-100 with synthetic human distributions  
- ImageNet subset for scalability testing
- Consistent interface across datasets

### 7. Conformal Prediction (`conformal_prediction.py`)
- Standard conformal prediction with APS scores
- Adaptive conformal prediction for distribution shift
- HPC-aware conformal sets incorporating human priors
- Risk-controlling prediction sets

## 🧪 Running Experiments

### Main Paper Results

```bash
# CIFAR-10H with empirical human confusion matrix
python evaluate_hpc.py --dataset cifar10h --batch_size 128

# CIFAR-100 with CLIP proxy prior
python evaluate_hpc.py --dataset cifar100 --batch_size 128

# ImageNet scalability (subset)
python evaluate_hpc.py --dataset imagenet --batch_size 64 --max_batches 100
```

### Ablation Studies

```bash
# Alpha parameter ablation
python experiment_configs.py ablation --type alpha --dataset cifar10h

# Method comparison (empirical vs CLIP vs adaptive)
python experiment_configs.py ablation --type methods --dataset cifar10h
```

### Custom Experiments

```python
from experiment_configs import HPCExperimentConfig, ExperimentManager

# Create custom configuration
config = HPCExperimentConfig(
    dataset="cifar10h",
    hpc_method="clip",
    alpha=0.25,
    use_adaptive_alpha=True,
    gating_strategy="uncertainty_based"
)

# Run experiment
manager = ExperimentManager()
result_dir = manager.run_experiment("my_experiment", config)
```

## 📈 Expected Results

Based on the paper, you should expect:

### CIFAR-10H Results
- **HPC (Empirical)**: ECE ~0.06, NLL_human ~0.45
- **HPC (CLIP)**: ECE ~0.08, NLL_human ~0.52  
- **Temperature Scaling**: ECE ~0.12, NLL_human ~0.68
- **Uncalibrated**: ECE ~0.18, NLL_human ~0.85

### CIFAR-100 Results
- **HPC (Proxy)**: ECE ~0.15, NLL_human ~1.2
- **Temperature Scaling**: ECE ~0.22, NLL_human ~1.8

### Key Metrics
- **ECE**: Expected Calibration Error (lower is better)
- **NLL_human**: Negative log-likelihood w.r.t human distributions (lower is better)
- **NLL_true**: Traditional NLL w.r.t true labels (lower is better)
- **AURC**: Area Under Risk-Coverage curve (lower is better)

## 🔬 Advanced Usage

### Custom Human Confusion Matrix

```python
from cifar10h_utils import CIFAR10HUtils

# Load your human annotations
utils = CIFAR10HUtils()
confusion_matrix = utils.load_human_annotations("your_annotations.json")

# Apply HPC
from hpc_core import HumanPriorCorrection
hpc = HumanPriorCorrection(num_classes=10)
corrected_probs = hpc.apply_correction(logits, confusion_matrix, alpha=0.3)
```

### Custom Proxy Prior

```python
from proxy_priors import ProxyPriorConstructor

constructor = ProxyPriorConstructor()
class_names = ["dog", "cat", "car", ...]  # Your class names

# Create CLIP-based confusion matrix
confusion_matrix = constructor.create_clip_prior(class_names)
```

### Advanced Adaptive Alpha

```python
from adaptive_gating import create_gating_mechanism

# Create uncertainty-aware gating
gating = create_gating_mechanism(
    "uncertainty_aware", 
    input_dim=10,
    monte_carlo_samples=20
)

# Compute adaptive alpha
alpha_values, uncertainty_info = gating(logits, base_alpha=0.3)
```

## 🧮 Computing Requirements

- **Memory**: 8GB+ RAM for CIFAR experiments, 16GB+ for ImageNet
- **GPU**: Optional but recommended for faster evaluation
- **Time**: ~5-10 minutes for CIFAR-10H, ~30 minutes for CIFAR-100 full evaluation

## 📝 Citation

If you use this implementation, please cite:

```bibtex
@inproceedings{your_paper_2026,
  title={Human-Prior Correction: Scalable Post-hoc Calibration that Aligns Vision Models with Human Uncertainty},
  author={Your Name and Others},
  booktitle={International Conference on Learning Representations},
  year={2026}
}
```

## 🤝 Contributing

This implementation follows the methodology described in the ICLR 2026 paper. For questions or improvements:

1. Check the paper for theoretical details
2. Review the extensive inline documentation
3. Run the provided test examples
4. Compare results with expected paper benchmarks

## 📋 Dependencies

- PyTorch >= 1.12.0
- torchvision >= 0.13.0
- numpy >= 1.21.0
- scipy >= 1.7.0
- matplotlib >= 3.5.0
- scikit-learn >= 1.0.0
- Pillow >= 8.3.0
- clip-by-openai (optional, for CLIP priors)
- timm >= 0.6.0 (for self-supervised models)

## 🔍 Troubleshooting

### Common Issues

1. **CLIP import error**: Install with `pip install git+https://github.com/openai/CLIP.git`
2. **Memory issues**: Reduce batch size or use `max_batches` parameter
3. **Missing human annotations**: Will use synthetic distributions with warning
4. **Model loading errors**: Check model path and architecture compatibility

### Performance Tips

1. Use GPU when available: `--device cuda`
2. Reduce evaluation size for testing: `--max_batches 10`
3. Cache proxy priors: They're computed once and reused
4. Use appropriate batch sizes: 128 for CIFAR, 64 for ImageNet

## 📊 Verification

To verify your installation works correctly:

```bash
# Quick test with synthetic data
python -c "
from hpc_core import HumanPriorCorrection
from evaluation_metrics import evaluate_calibration_comprehensive
import torch
import torch.nn.functional as F

# Synthetic test
logits = torch.randn(100, 10)
targets = torch.randint(0, 10, (100,))
confusion = torch.eye(10) * 0.8 + torch.ones(10, 10) * 0.02

hpc = HumanPriorCorrection(10)
corrected = hpc.apply_correction(logits, confusion, alpha=0.3)

print('✓ HPC core working')
print(f'Corrected probabilities shape: {corrected.shape}')
print(f'Probabilities sum to 1: {torch.allclose(corrected.sum(dim=1), torch.ones(100))}')
"
```

Expected output: 
```
✓ HPC core working
Corrected probabilities shape: torch.Size([100, 10])
Probabilities sum to 1: True
```
