# Ordinal Neural Collapse (ONC) - Code Repository

**Quick overview:** Sec.1 presents the repository structure. Sec.2 describes the datasets. For a quick start, install the conda environment following Sec.3, then run the main experiments on a single GPU according to Sec.4 instructions to reproduce Fig.2 and Fig.4 from the main paper (requires approximately 3.6GB GPU memory and takes about 30 minutes. If you encounter out-of-memory issues or want faster execution, you can modify the parallelism parameter as explained in Sec.4).

## 1. Repository Structure

```text
onc_experiments/                                   # ← unzip creates this root folder
├── datasets-orreview/ordinal-regression/          # five public datasets (30-holdout splits)
│   ├── car
│   ├── ERA
│   ├── LEV
│   ├── SWD
│   └── winequality-red
│
├── experiment_metric_curve/                       # Experiment1: metric curves
│   ├── run.sh                                     # Main script (parallel over folds)
│   ├── experiment.py                              
│   ├── metrics.py                                 
│   ├── models.py      
│   ├── utils_data.py                            
│   ├── training.py                                
│   ├── plot.py
│   ├── combine_plots.py                                    
│   ├── logs/                                      # runtime logs
│   ├── results/                                   # per-fold CSVs
│   └── vis/aggregated/                            # combined figures (combine_plots.py)
│
├── experiment_visualization/                      # Experiment2: feature-latent-space visualisations
│   ├── run_vis.sh                                 # Main script
│   ├── experiment_vis.py                          
│   ├── models.py
│   ├── utils_data.py
│   ├── combine_plots_vis.py                                 
│   ├── logs_vis/                                  # runtime logs
│   └── vis/                                       
│       ├── ./*_feat_epochs/                       # feature space visualizations
│       ├── ./*_z_epochs/                          # latent space visualizations
│       └── combined/                              # combined figures (combine_plots_vis.py )
│
├── environment.yml                                # conda environment
├── LICENSE
└── README.md
```


## 2. Datasets

This repository **already includes** all ordinal regression datasets used in the study - specifically the five datasets (car, ERA, LEV, SWD, and winequality-red, corresponding to CA, ER, LE, SW, and WR in the paper).  
**No download is required** - you can proceed directly to the next steps to run experiments.

These datasets originate from the **AYRNA** Research Group's Ordinal Regression Review collection:  
<https://www.uco.es/grupos/ayrna/orreview>

As shown in the Repository Structure section, only the five datasets relevant to our experiments are included. Each dataset contains **30 pre-defined train/test splits** (30-holdout), matching the experimental setup described in the paper.

**Note:** If you prefer to download the datasets manually, you can get `datasets-orreview.zip` from the URL above. For consistency with our experiments, we recommend keeping only the five datasets listed in the Repository Structure and removing all others to maintain a clean repository.

## 3. Installation & Setup

This repository is self-contained with all necessary code and configuration. Create the conda environment as follows:

```bash
# Create the conda environment
conda env create -f environment.yml
conda activate onc
```

**Note:** You may want to verify GPU availability with `python -c "import torch; print('CUDA available:', torch.cuda.is_available())"` before running experiments.


## 4. Running Experiments

### Reproducing Fig. 2 - Metric Curves (ERA dataset with logit model)

The metric curves shown in **Fig. 2 of the paper** can be reproduced with the command below.  
Our `run.sh` launches several folds **in parallel on a single GPU**:

* **GPU memory usage** ≈ 360 MB per fold  
* **Default setting** runs 10 folds in parallel (≈ 3.6 GB total GPU memory)  
* **Customization** You can set any value from **1 to 30** folds depending on your GPU memory

```bash
# From onc_experiments/
cd experiment_metric_curve

# dataset = ERA, link = logit, parallelism = 10 folds
bash run.sh ERA logit 10     
# ➜ figure written to vis/aggregated/ERA/logit/ERA_logit_combined_metrics.png
```

Adjust the last argument based on your available GPU memory;  
setting it to `30` uses maximum parallelism but requires about 11 GB of GPU memory.

The resulting figure should match Fig. 2 in the paper.

### Reproducing Fig. 4 - Feature & Latent Space Visualization (ERA dataset with logit model)

Run the visualization pipeline:

```bash
# From onc_experiments/
cd experiment_visualization
bash run_vis.sh ERA logit
# ➜ combined PNGs written to:
#    vis/combined/ERA_logit_fix_f29_combined_visualization.png
#    vis/combined/ERA_logit_learn_f29_combined_visualization.png
```

* **Scope** uses the last fold and both threshold variants (fixed / learnable)

The resulting image(s) should match Fig. 4 in the paper.

**Note:** If you're interested in reproducing the complete experimental results for datasets ER, LE, SW, CA, and WR with both logit and probit models from **Appendix C** of the paper, you can run all experiments using the commands below. Please be aware that with the default 10-fold parallelism setting, this will take approximately **20 hours** on a single GPU:

```bash
# Processes all 5 datasets (ER, LE, SW, CA, WR) with both link functions (logit and probit)
# From onc_experiments/
cd experiment_metric_curve
bash run.sh

# For all feature & latent-space visualizations
# From onc_experiments/
cd experiment_visualization
bash run_vis.sh
```
---

## 5. License

This code is released under the MIT License, with an anonymous signature to maintain the double-blind review process.

All datasets used in this research are from the AYRNA Research Group's Ordinal Regression Review collection. These datasets have been properly cited in our paper, and their use complies with the original licensing terms.

See the [LICENSE](LICENSE) file for MIT license details regarding our code.