# Neural Network Robustness Experiments

This directory contains code for evaluating the adversarial robustness of neural networks in Rashomon sets. These experiments are presented in **Appendix D.3** of the paper.

## Overview

The experiments extend the robustness analysis from decision trees to neural networks, demonstrating that:
- Rashomon sets of neural networks exhibit improved robustness properties
- Adversarial Weight Perturbation (AWP) can be used to generate diverse robust models
- The robustness-privacy trade-off applies to neural networks as well

## Directory Structure

```
robustness_nn/
├── awp.py                  # Adversarial Weight Perturbation implementation
├── dataset.py              # Dataset loading and preprocessing
├── fgsm.py                 # Fast Gradient Sign Method (FGSM) attacks
├── run_awp.py              # Script to run AWP training
├── run_experiment.py       # Main experiment runner
└── plot.py                 # Plotting and visualization
```

## Files Description

- **awp.py**: Implements the Adversarial Weight Perturbation method for training robust neural networks and generating diverse models in Rashomon sets
- **dataset.py**: Handles dataset loading, preprocessing, and train/test splits for binary classification tasks
- **fgsm.py**: Implements FGSM and other adversarial attack methods for evaluating robustness
- **run_awp.py**: Training script for generating Rashomon sets using AWP
- **run_experiment.py**: Main script for running robustness experiments across datasets
- **plot.py**: Generates plots comparing robustness of single models vs. Rashomon sets

## Running the Experiments

### Step 1: Train Models with AWP

Generate Rashomon sets of neural networks:

```bash
python run_awp.py
```

This will train multiple models using AWP with different hyperparameters.

### Step 2: Run Robustness Evaluation

Evaluate adversarial robustness:

```bash
python run_experiment.py
```

This script:
- Loads trained models
- Applies FGSM attacks with various epsilon values
- Computes robustness metrics
- Saves results for analysis

### Step 3: Generate Plots

Create visualization plots:

```bash
python plot.py
```

This generates plots showing:
- Accuracy vs. robustness trade-offs
- Comparison between single models and Rashomon sets
- Robustness across different attack strengths

## Supported Datasets

The experiments support several binary classification datasets:

- **COMPAS**: Recidivism prediction
- **FICO**: Credit scoring
- **Iris**: Flower classification (binary subset)
- **Penguin**: Penguin species classification (binary subset)
- **Digits4**: Handwritten digit recognition (4 vs. not-4)
- **Seeds**: Wheat seed classification
- **Wine**: Wine quality classification

Dataset selection can be configured in `run_experiment.py`:

```python
datasets = ['compas', 'fico', 'iris']
```

## Model Architecture

The experiments use a multi-layer perceptron (MLP) architecture defined in `awp.py`:

```python
# Example: 4-layer MLP with 20 hidden units per layer
ctor = lambda d: MLPBinary2Logits(d=d, hidden=20, depth=4, dropout=0.0)
```

You can customize:
- `hidden`: Number of hidden units per layer
- `depth`: Number of hidden layers
- `dropout`: Dropout rate for regularization

## Attack Configuration

FGSM attack parameters can be configured in the scripts:

```python
epsilon_values = [0.05, 0.1, 0.2, 0.3]  # Attack strengths
```

## Expected Output

Running the experiments produces:

1. **Trained Models**: Saved model checkpoints for single models and Rashomon sets
2. **Results Files**: JSON/CSV files with robustness metrics
3. **Plots**: 
   - Accuracy vs. epsilon curves
   - Robust accuracy comparisons
   - Trade-off visualizations

## Key Metrics

The experiments compute:

- **Clean Accuracy**: Accuracy on unperturbed test data
- **Robust Accuracy**: Accuracy under adversarial attacks
- **Attack Success Rate**: Percentage of successful adversarial examples
- **Average Confidence**: Model confidence on predictions

## Adversarial Weight Perturbation (AWP)

AWP generates diverse models by:
1. Training a base model to high accuracy
2. Perturbing model weights in adversarially chosen directions
3. Fine-tuning perturbed models to maintain accuracy
4. Selecting models that form a Rashomon set

Parameters for AWP:
- `awp_gamma`: Weight perturbation magnitude
- `awp_warmup`: Number of warmup epochs before AWP
- `num_models`: Number of models to generate in Rashomon set

## Performance Notes

- **GPU Recommended**: Neural network training benefits significantly from GPU acceleration
- **Training Time**: Generating Rashomon sets with AWP can take several hours depending on dataset size
- **Memory**: Larger models and batch sizes require more memory

## Troubleshooting

**Issue**: CUDA out of memory
- **Solution**: Reduce batch size or model size (hidden units/depth)

**Issue**: Poor model convergence
- **Solution**: Adjust learning rate, increase training epochs, or modify AWP parameters

**Issue**: Low robust accuracy
- **Solution**: This may be expected for high epsilon values; try smaller perturbation budgets

## Comparison with Tree-Based Methods

Key differences from `robustness_tree/`:
- Uses neural networks instead of decision trees
- Implements AWP for Rashomon set generation instead of GOSDT
- FGSM attacks instead of tree-specific attacks
- Applicable to more complex/continuous feature spaces

## Citation

If you use the AWP method, please cite the relevant paper:

```bibtex
@inproceedings{wu2020adversarial,
  title={Adversarial Weight Perturbation Helps Robust Generalization},
  author={Wu, Dongxian and Xia, Shu-Tao and Wang, Yisen},
  booktitle={NeurIPS},
  year={2020}
}
```

## Notes

- This implementation is optimized for binary classification tasks
- For multi-class problems, modify the architecture and loss functions accordingly
- Experiments can be parallelized across different datasets
- Results should complement the tree-based robustness findings in the main paper
