================================================================================
REPRODUCTION GUIDE - ICML 2026 SUPPLEMENTARY MATERIAL
Robust Regression Certification via Randomized Smoothing
================================================================================

This guide provides explicit commands to reproduce all experiments in the paper.
All commands should be run from the root directory of this submission folder.

================================================================================
INSTALLATION
================================================================================

1. Install the package:
    pip install -e .

2. Verify installation:
    python -c "from alpha_smoothing_repro.certify.bounded_fn_certifier_with_mean import BoundedCertifierWithMean; print('✓ Installation successful')"

3. Note on MNIST data:
    - The first time you run MNIST experiments, the scripts will automatically 
      download the original MNIST dataset (~10MB) to ./data/
    - When using --use_rotation_dataset, the script will then generate rotated 
      versions (random angles 0-360°) on-the-fly using 
      experiments/mnist_rotation/dataset_generator.py
    - This rotation generation takes ~1-2 minutes for the test set (10,000 images)
    - The rotation is deterministic (seed=42) so results are reproducible
    - You will see: "Generating 10000 rotated samples..." - this is expected!

================================================================================
VALIDATING CERTIFICATE TIGHTNESS ON SYNTHETIC DATA
================================================================================

PURPOSE:
Validate soundness and tightness on 3 unbounded synthetic functions where true
worst-case radius can be computed analytically. Compares (C,G) method vs
α-trimming baseline.

PARAMETERS:
- Functions: quadratic, slice, sandwich
- σ ∈ {0.1, 0.2, 0.5}, ε_y ∈ {0.2, 0.5}
- α-trimming: α ∈ {0.35, 0.49}, P = 0.9
- N = 5,000 samples
- 10 test points per function

COMMAND TO RUN:

    python scripts/analysis/test_unbounded_certifiers_synthetic.py \
        --function all_unbounded \
        --sigma 0.1 \
        --eps_y 0.5 \
        --alpha_trim 0.35 \
        --P 0.9 \
        --N_samples 5000 \
        --n_test_points 10 \
        --compute_true_radius \
        --output synthetic_validation_results.json

BATCH SCRIPT FOR MULTIPLE PARAMETERS:

    bash scripts/experiments/run_unbounded_synthetic_experiments.sh

This script tests:
- σ ∈ {0.1, 0.2, 0.5}
- ε_y ∈ {0.2, 0.5}
- α ∈ {0.35, 0.49}
- Creates 12 total combinations (3 sigma × 2 epsilon_y × 2 alpha)

Results saved in: unbounded_synthetic_experiments_results/

EXPECTED OUTPUT:
- File: synthetic_validation_results.json
- Contains: Certified radii, true radii, soundness check (cert ≤ true)
- Computation time: ~5-10 minutes per parameter combination

EXPECTED RESULTS:
- 100% soundness (all certified radii ≤ true radii)
- (C, G): Mean tightness ratios 0.76-0.94, 100% soundness
- α-trimming: Mean tightness ratios 0.02-0.83, variable soundness (40-100%)


================================================================================
CONVERGENCE ANALYSIS OF RADIUS ESTIMATORS
================================================================================

PURPOSE:
Validate that statistical estimators (variance C, gradient norm ||G||) converge
correctly and certified radii converge to theoretical values as sample size N
increases.

PARAMETERS:
- Single MNIST test image (index 0)
- σ = 0.5, ε_y = 10° (0.175 radians)
- N ∈ {100, 500, 1000, 5000, 10000}
- 10 trials per N value
- Confidence level: 95%

COMMAND TO RUN:

    python scripts/run_single_point_convergence_analysis.py \
        --image_idx 0 \
        --sigma 0.5 \
        --eps_y 10.0 \
        --N_values 100 500 1000 5000 10000 \
        --n_trials 10 \
        --confidence 0.95 \
        --device cpu \
        --output convergence_analysis_results.json

Note: Use --device cuda if GPU available (much faster)

EXPECTED OUTPUT:
- File: convergence_analysis_results.json
- Contains: Estimator convergence data, radius convergence data
- Computation time: ~30-60 minutes (CPU), ~10-20 minutes (GPU)

TO GENERATE CONVERGENCE PLOTS:

    python scripts/plot_theta_convergence_analysis.py \
        --convergence_data convergence_analysis_results.json \
        --output_dir convergence_plots

EXPECTED RESULTS:
- Variance and gradient norm estimates converge to true values with O(1/√N) rate
- Confidence intervals shrink proportionally with N
- At N=10,000, mean bias < 5%
- Empirical certified radii converge to theoretical radii


================================================================================
MNIST ROTATION CERTIFICATION AND COMPARISON
================================================================================

PURPOSE:
Evaluate certification methods on MNIST rotation prediction task (high-dimensional
regression). Compare (E,C)+M, (E,C,G)+M, and α-smoothing baseline.

PARAMETERS:
- 100 stratified test samples (10 per digit class)
- σ ∈ {0.06, 0.12, 0.25, 0.50, 0.75}
- ε_y = 10° (0.175 radians), M = π radians
- N = 10,000 samples per certification

STEP 1: ESTIMATE VARIANCE AND GRADIENT (C, G) FOR EACH SIGMA
--------------------------------------------------------------

This step computes and saves variance and gradient estimates for all test samples.
The estimates are reusable for any ε_y value.

Run for each sigma value:

# σ = 0.06
python scripts/mnist_rotation_full_certification.py \
    --n_test 100 \
    --sigma 0.06 \
    --N_values 10000 \
    --n_trials 1 \
    --stratified \
    --use_rotation_dataset \
    --confidence 0.95 \
    --device cpu \
    --skip_bootstrap \
    --output estimation_sigma0.06_n100.json

# σ = 0.12
python scripts/mnist_rotation_full_certification.py \
    --n_test 100 \
    --sigma 0.12 \
    --N_values 10000 \
    --n_trials 1 \
    --stratified \
    --use_rotation_dataset \
    --confidence 0.95 \
    --device cpu \
    --skip_bootstrap \
    --output estimation_sigma0.12_n100.json

# σ = 0.25
python scripts/mnist_rotation_full_certification.py \
    --n_test 100 \
    --sigma 0.25 \
    --N_values 10000 \
    --n_trials 1 \
    --stratified \
    --use_rotation_dataset \
    --confidence 0.95 \
    --device cpu \
    --skip_bootstrap \
    --output estimation_sigma0.25_n100.json

# σ = 0.50
python scripts/mnist_rotation_full_certification.py \
    --n_test 100 \
    --sigma 0.50 \
    --N_values 10000 \
    --n_trials 1 \
    --stratified \
    --use_rotation_dataset \
    --confidence 0.95 \
    --device cpu \
    --skip_bootstrap \
    --output estimation_sigma0.50_n100.json

# σ = 0.75
python scripts/mnist_rotation_full_certification.py \
    --n_test 100 \
    --sigma 0.75 \
    --N_values 10000 \
    --n_trials 1 \
    --stratified \
    --use_rotation_dataset \
    --confidence 0.95 \
    --device cpu \
    --skip_bootstrap \
    --output estimation_sigma0.75_n100.json

EXPECTED OUTPUT:
- Files: estimation_sigma*.json (one per sigma value)
- Contains: Variance estimates (C_hat, C_upper), gradient estimates (G_norm_hat, G_norm_upper)
- Computation time: ~2-4 hours per sigma value (CPU), ~30-60 min per sigma (GPU)
- Total: ~10-20 hours for all 5 sigma values (CPU)

Note: Use --device cuda if GPU available for faster computation.
      Add --use_rotation_dataset flag if using rotated MNIST with ground truth angles.


STEP 2: COMPUTE CERTIFIED RADII FOR BOTH METHODS
-----------------------------------------------

This step uses the saved estimates to compute certified radii for both methods:
(E,C)+M (without gradient) and (E,C,G)+M (with gradient).

Run for each sigma value:

# σ = 0.06
python scripts/compare_variance_mean_vs_with_gradient.py \
    --mode precomputed \
    --variance_gradient estimation_sigma0.06_n100.json \
    --eps_y_deg 10.0 \
    --N 10000 \
    --trial 0 \
    --output comparison_sigma0.06_eps10deg.json

# σ = 0.12
python scripts/compare_variance_mean_vs_with_gradient.py \
    --mode precomputed \
    --variance_gradient estimation_sigma0.12_n100.json \
    --eps_y_deg 10.0 \
    --N 10000 \
    --trial 0 \
    --output comparison_sigma0.12_eps10deg.json

# σ = 0.25
python scripts/compare_variance_mean_vs_with_gradient.py \
    --mode precomputed \
    --variance_gradient estimation_sigma0.25_n100.json \
    --eps_y_deg 10.0 \
    --N 10000 \
    --trial 0 \
    --output comparison_sigma0.25_eps10deg.json

# σ = 0.50
python scripts/compare_variance_mean_vs_with_gradient.py \
    --mode precomputed \
    --variance_gradient estimation_sigma0.50_n100.json \
    --eps_y_deg 10.0 \
    --N 10000 \
    --trial 0 \
    --output comparison_sigma0.50_eps10deg.json

# σ = 0.75
python scripts/compare_variance_mean_vs_with_gradient.py \
    --mode precomputed \
    --variance_gradient estimation_sigma0.75_n100.json \
    --eps_y_deg 10.0 \
    --N 10000 \
    --trial 0 \
    --output comparison_sigma0.75_eps10deg.json

EXPECTED OUTPUT:
- Files: comparison_sigma*_eps10deg.json (one per sigma value)
- Contains: Certified radii for both (E,C)+M (radius_variance_mean) and (E,C,G)+M (radius_with_gradient) methods
- Computation time: ~1-2 minutes per sigma value (fast, no model evaluation)


STEP 3: COMPUTE ALPHA-TRIMMING BASELINE (OPTIONAL)
---------------------------------------------------

If you want to compare with α-trimming baseline:

Run for each sigma value (example for σ=0.06):

python scripts/mnist_alpha_trimming_certification.py \
    --previous_json estimation_sigma0.06_n100.json \
    --sigma 0.06 \
    --alpha 0.49 \
    --P 0.9 \
    --eps_y 10.0 \
    --use_rotation_dataset \
    --output alpha_trimming_sigma0.06_alpha0.49_P0.9.json

Note: α-trimming requires different α values for different sigma.
      Use α=0.49 for σ≤0.12, α=0.35 for σ≥0.25.


STEP 4: GENERATE CDF COMPARISON PLOT
-------------------------------------

Compare certified radii across methods using CDF plot:

python scripts/plot_certified_accuracy_curves.py \
    --comparison_files comparison_sigma*.json \
    --alpha_files alpha_trimming_sigma*.json \
    --tolerance 10.0 \
    --output_dir plots

EXPECTED OUTPUT:
- Plots showing CDF of certified radii for each method at best σ
- (E,C,G)+M: mean radius ~0.207 pixels at σ=0.75, 0% abstain
- (E,C)+M: mean radius ~0.090 pixels at σ=0.06, 1% abstain
- α-trimming: mean radius ~0.120 pixels at σ=0.06, 6% abstain


================================================================================
CERTIFIED ACCURACY ANALYSIS
================================================================================

PURPOSE:
Evaluate practical utility using certified accuracy (absolute and conditional)
and certified mean distance metrics at different radius thresholds.

PREREQUISITES:
- Certification results from MNIST Rotation Certification (all sigma values)
- α-trimming results (optional, for comparison)

PARAMETERS:
- Radius thresholds R: [0.05, 0.10, 0.15, 0.20, 0.25] pixels
- Correctness tolerance: 10° (for certified accuracy)
- For each method, select best σ based on sum of certified accuracies

COMMAND TO COMPUTE ABSOLUTE CERTIFIED ACCURACY:

    python scripts/compute_certified_accuracy_best_sigma.py \
        --comparison_dir . \
        --alpha_dir alpha_trimming_results \
        --tolerance 10.0 \
        --R_values 0.05 0.10 0.15 0.20 0.25 \
        --sigmas 0.06 0.12 0.25 0.5 0.75 \
        --output certified_accuracy_absolute_table.tex

COMMAND TO COMPUTE CONDITIONAL CERTIFIED ACCURACY:

    python scripts/compute_certified_accuracy_best_sigma.py \
        --comparison_dir . \
        --alpha_dir alpha_trimming_results \
        --tolerance 10.0 \
        --R_values 0.05 0.10 0.15 0.20 0.25 \
        --sigmas 0.06 0.12 0.25 0.5 0.75 \
        --normalize_by_certified \
        --output certified_accuracy_conditional_table.tex

COMMAND TO COMPUTE CERTIFIED MEAN DISTANCE:

    python scripts/compute_certified_mean_distance_best_sigma.py \
        --comparison_dir . \
        --alpha_dir alpha_trimming_results \
        --R_values 0.05 0.10 0.15 0.20 0.25 \
        --output certified_mean_distance_table.tex

COMMAND TO GENERATE COMBINED METRICS TABLE:

    python scripts/compute_combined_certified_metrics_table.py \
        --comparison_dir . \
        --alpha_dir alpha_trimming_results \
        --tolerance 10.0 \
        --R_values 0.05 0.10 0.15 0.20 0.25 \
        --sigmas 0.06 0.12 0.25 0.5 0.75 \
        --output certified_metrics_combined_table.tex

EXPECTED OUTPUT:
- LaTeX tables with certified accuracy and mean distance metrics
- (E,C,G)+M: 95% absolute accuracy, 100% conditional accuracy at R=0.15, mean dist=3.90°
- (E,C)+M: 2% absolute accuracy, 100% conditional accuracy at R=0.15, mean dist=5.80°
- α-trimming: 39% absolute accuracy, 100% conditional accuracy at R=0.15, mean dist=4.65°
- Computation time: ~1-2 minutes per table

NOTES:
- The script automatically finds best σ for each method at each threshold
- --comparison_dir: Directory containing comparison_*.json files
- --alpha_dir: Directory containing alpha_trimming_*.json files (optional)
- If alpha files not available, script will skip α-trimming comparison


================================================================================
CERTIFICATE VALIDATION: TIGHTNESS AND SOUNDNESS
================================================================================

================================================================================
PART A: TIGHTNESS ANALYSIS (PSEUDO-TRUE RADIUS)
================================================================================

PURPOSE:
Compare certified radii with pseudo-true radii computed using PGD optimization
to evaluate certificate tightness.

PARAMETERS:
- 100 MNIST rotation test samples
- σ = 0.5, ε_y = 10° (0.175 radians)
- PGD: 5 restarts, 100 steps, R_max = 5.0 pixels
- MC samples: N_mc = 50,000 (baseline), N_attack = 2,000 (per PGD step)

PREREQUISITES:
- Variance/gradient estimates: estimation_sigma0.50_n100.json
- Certified radii: comparison_sigma0.50_eps10deg.json

COMMAND TO COMPUTE PSEUDO-TRUE RADIUS:

    python scripts/compute_mnist_pseudo_true_radius_simple.py \
        --variance_gradient estimation_sigma0.50_n100.json \
        --sigma 0.5 \
        --eps_y_deg 10.0 \
        --n_points 100 \
        --n_mc 50000 \
        --n_attack 2000 \
        --n_restarts 5 \
        --n_steps 100 \
        --R_max 5.0 \
        --tolerance 0.001 \
        --use_rotation_dataset \
        --device cuda \
        --output pseudo_true_radius_sigma0.5_n100.json

EXPECTED OUTPUT:
- File: pseudo_true_radius_sigma0.5_n100.json
- Contains: Pseudo-true radii for all 100 samples
- Computation time: ~10-20 hours (GPU), ~50-100 hours (CPU)
- Note: This is computationally expensive (~1M model evaluations per sample)

COMMAND TO ANALYZE TIGHTNESS:

    python scripts/analysis/analyze_pseudo_radius_results.py \
        --directory . \
        --certified comparison_sigma0.50_eps10deg.json \
        --plot \
        --output_dir tightness_plots

COMMAND TO CREATE TIGHTNESS FIGURES:

    python scripts/analysis/create_tightness_figures.py \
        --pseudo_dir . \
        --certified_file comparison_sigma0.50_eps10deg.json \
        --sigma 0.5 \
        --method with_gradient \
        --output_dir tightness_plots

EXPECTED RESULTS:
- Mean ratio (pseudo-true / certified): ~3.5× for (E,C,G)+M
- 79% of samples within R_max bound (21% hit bound, excluded from ratio)
- Ratios range from 2× to 5×, indicating consistent conservatism
- 2% of samples have ratio < 1.0 (finite-sample estimation error, within theoretical 5% failure rate)


================================================================================
COMPUTATIONAL REQUIREMENTS
================================================================================

Estimated computation time (single-threaded CPU):
- Validating Certificate Tightness on Synthetic Data: ~5-10 minutes per combination
- Convergence Analysis: ~30-60 minutes
- MNIST Rotation Step 1 (Estimation): ~10-20 hours (all 5 sigma values)
- MNIST Rotation Step 2 (Certification): ~5-10 minutes (all 5 sigma values)
- Certified Accuracy Analysis: ~1-2 minutes (post-processing only)
- Certificate Validation (Tightness): ~10-20 hours (GPU), ~50-100 hours (CPU)

TOTAL: ~20-40 hours (CPU), ~10-20 hours (GPU)

GPU acceleration:
- Add --device cuda to all scripts that support it
- Expected speedup: 3-5× for estimation, 5-10× for validation
- GPU memory requirement: ~4-8 GB

Parallelization:
- Estimation (MNIST Rotation Step 1): Can run different sigma values in parallel
- Tightness analysis: Can split across samples and run in parallel
- Each can be distributed across multiple machines/GPUs


================================================================================
DIRECTORY STRUCTURE AFTER RUNNING EXPERIMENTS
================================================================================

After running all experiments, the directory should contain:

robust_reg_submission/
├── src/                                    # Core implementation
├── scripts/                                # Experiment scripts
├── experiments/                            # Experiment setup
├── pyproject.toml                          # Package configuration
├── README.md                               # Quick start guide
├── REPRODUCTION_GUIDE.txt                  # This file
│
├── estimation_sigma*.json                  # MNIST Rotation Step 1 outputs
├── comparison_sigma*.json                  # MNIST Rotation Step 2 outputs
├── synthetic_validation_results.json       # Synthetic Validation output
├── convergence_analysis_results.json       # Convergence Analysis output
├── pseudo_true_radius_*.json               # Tightness Analysis output
│
├── certified_accuracy_*.tex                # Certified Accuracy tables
├── certified_mean_distance_*.tex           # Certified Accuracy tables
├── certified_metrics_combined_table.tex    # Certified Accuracy combined table
│
└── plots/                                  # Generated figures
    ├── convergence_plots/                  # Convergence Analysis figures
    └── tightness_plots/                    # Tightness Analysis figures


================================================================================
TROUBLESHOOTING
================================================================================

Import Errors:
    Q: "ModuleNotFoundError: No module named 'alpha_smoothing_repro'"
    A: Run "pip install -e ." from the root directory

    Q: "ModuleNotFoundError: No module named 'e2cnn'"
    A: Install e2cnn: "pip install e2cnn"

MNIST Dataset:
    Q: MNIST dataset not found
    A: The scripts will automatically download MNIST to ./data/ on first run

Model File:
    Q: "FileNotFoundError: experiments/mnist_rotation/e2cnn_rotation_model.pth"
    A: The trained model should be in experiments/mnist_rotation/
       If missing, you need to train it first using train_e2cnn_rotation.py

Memory Issues:
    Q: Out of memory during estimation
    A: Use --skip_bootstrap flag to reduce memory usage
       Reduce batch size in the code if still having issues

GPU Issues:
    Q: CUDA out of memory
    A: Reduce batch size in mnist_rotation_full_certification.py
       Use --device cpu to fall back to CPU (slower but more memory)

Computation Time:
    Q: Taking too long
    A: Use GPU if available (--device cuda)
       Run different sigma values in parallel on multiple machines
       Reduce n_trials for convergence analysis


================================================================================
MINIMAL REPRODUCTION (FOR QUICK VERIFICATION)
================================================================================

If you want to quickly verify the code works without running full experiments:

1. Quick synthetic validation (1 test point):
    python scripts/analysis/test_unbounded_certifiers_synthetic.py \
        --function unbounded_quadratic \
        --sigma 0.1 \
        --eps_y 0.5 \
        --N_samples 1000 \
        --n_test_points 1 \
        --compute_true_radius

2. Quick MNIST estimation (10 samples):
    python scripts/mnist_rotation_full_certification.py \
        --n_test 10 \
        --sigma 0.5 \
        --N_values 1000 \
        --n_trials 1 \
        --stratified \
        --use_rotation_dataset \
        --skip_bootstrap

3. Quick certification:
    python scripts/compare_variance_mean_vs_with_gradient.py \
        --mode precomputed \
        --variance_gradient <estimation_file>.json \
        --eps_y_deg 10.0 \
        --N 1000

Total time: ~5-10 minutes


================================================================================
DATA FILES LOCATION
================================================================================

Input Data:
- MNIST dataset: Downloaded automatically to ./data/MNIST/
- Trained model: experiments/mnist_rotation/e2cnn_rotation_model.pth

Output Data (created by experiments):
- Estimation results: estimation_sigma*.json
- Certification results: comparison_*.json
- Validation results: pseudo_true_radius_*.json
- Tables: *.tex files
- Plots: plots/ directory


================================================================================
NOTES ON REPRODUCIBILITY
================================================================================

Random Seeds:
- All scripts use fixed random seeds for reproducibility
- Default seed: 42 (can be changed via --seed argument)
- Stratified sampling ensures same test samples across runs

Statistical Variation:
- Monte Carlo estimation introduces statistical variation
- Results may vary slightly across runs due to sampling
- Use more samples (larger N) for more stable results
- Confidence intervals account for statistical uncertainty

Computational Resources:
- Full experiments designed for GPU cluster (10-20 hours)
- Can be run on CPU but takes longer (20-40 hours)
- Consider running different sigma values in parallel


================================================================================
END OF REPRODUCTION GUIDE
================================================================================
