# Uncertainty Reduction of Class-Conditional Conformal Prediction via Multi-Input Aggregation

This repository contains the code necessary to reproduce the experimental results presented in our submission.

## General

-   `utils.py`: Contains all functions used for constructing prediction sets and generating synthetic data.

To accelerate computation, quantiles for certain distributions are precomputed:

-   `Bin_quantile_lvl0.1_m100`: Contains quantiles of the Binomial distribution B(m, \alpha) for m = 1, ..., 100 and \alpha = 0.1 (values used in the paper).
-   `nHgeo_quantile_lvl0.1_m100n1000`: Contains quantiles of the Beta-Binomial distribution BetaBinomial(m, n+1-k, k) for m = 1, ..., 100 ;  n = 1, ..., 1000; and  k = \lfloor(n+1)\alpha\rfloor.

## Synthetic Data Experiments

-   `CP_Truedata.py`: Runs all experiments on synthetic data and generates the corresponding figures (Figures 2, 4, and 5 in the paper).

## Real Data Experiments

### Preprocessing

The `Pre-processing PlantClef` directory contains all necessary functions to compute softmax scores before applying conformal prediction methods.

-   **Dataset**: We use data from LifeClef 2015 ([LifeClef 2015 Plant Task](https://www.imageclef.org/lifeclef/2015/plant)), consisting of two datasets that need to be downloaded:

    -   `PlantCLEF2015TrainingData`
    -   `PlantCLEF2015TestDataWithAnnotations`

-   **Model**: We fine-tune the pre-trained model `resnet50_weights_best_acc.tar` provided by Garcin et al. (2021) ([GitHub Repository](https://github.com/plantnet/PlantNet-300K/)) which must also be downloaded. The file `utils_PlantNet300K.py` comes from this source.

-   **Data Split**:

    -   `Newsplit.py`: Splits the merged dataset into three equally sized parts, preserving observation structure (i.e., images from the same observation remain in the same split). Only the indices are returned of the new sets are returned.

-   **Training**:

    -   `model_training.py`: Trains the models using the new split. It relies on `utils_PlantNet300K.py` (from the [PlantNet-300K repository](https://github.com/plantnet/PlantNet-300K/)) to properly load the pre-trained model.

-   **Score Computation**:

    -   `softmax_computation.py`: Computes the softmax scores for both the test and calibration datasets.

### Conformal Prediction

-   `CP_Truedata.py`: Computes coverage (both marginal and conditional) and average size (both marginal and conditional) of the different methods on real data, across various splits.

-   `plots_truedata.py`: Generates Figures 3, 7, and 8 related to real data. The necessary data files are provided:

    -   `Neurips_Plantnet_shuffle_temp1alpha0.1minclasssize20.npy`
    -   `Neurips_Plantnet_shuffle_temp20alpha0.1minclasssize20.npy`
    -   `Neurips_Plantnet_trueobs_temp1alpha0.1minclasssize20.npy`
