# Inconsistency‑Aware Minimization (IAM) — Supplementary Material

This repository accompanies our NeurIPS 2025 submission **“Inconsistency‑Aware Minimization: Improving Generalization with Unlabeled Data.”** 

> **PyTorch version notice**  Earlier drafts mistakenly referred to *PyTorch 2.7*.  All experiments were actually conducted with **PyTorch 2.5.1**.

---

## 1  Quick‑start (TL;DR)

```bash
# create & activate isolated environment
conda create -n iam python=3.10 -y
conda activate iam

# install dependencies
pip install -r requirements.txt

# run CIFAR‑10 experiment (IAM‑D)
python3 train.py --optimizer IAM --beta 1.0 --rho 0.1 --semi False

# run CIFAR-10 experimnet (IAM-S)
python3 train2.py 
```

---

## 2  Repository layout

```
├── src/
│   ├── data.py      # data loaders & augmentation
│   ├── model.py     # WideResNet, 6CNN backbone
│   ├── sam.py       # Sharpness‑Aware Minimization
│   ├── IAM.py       # IAM optimizers, loss, estimating local inconsistency
│   ├── train.py     # IAM-D training
|   ├── train2.py    # IAM-S training
│   |── simclr.py    # self‑supervised experiments
│   └── toy.ipynb    # code for toy example in Appendix D
├── figures
│   ├── heatmap10.png   # heatmap for hyper parameter sensitivity in IAM-D
│   ├── heatmap100.png  # heatmap for hyper parameter sensitivity in IAM-D
│   ├── IAM-S_rho.png   # 
│   ├── K_vs_LI.png     # 
├── requirements.txt # Python deps (PyTorch ≥ 2.5.1)
├── LICENSE          # MIT License for our code
└── LICENSE.third_party  # licenses for external code (see below)
```

---

## 3  Reproducing results

| Paper section                    | Command (seed 0 example)                                                     |
| -------------------------------- | ---------------------------------------------------------------------------- |
| Supervised CIFAR‑10  (IAM‑D)     | `python3 train.py --dataset CIFAR-10 --optimizer IAM --beta 1.0 --rho 0.1`    |
| Supervised CIFAR‑100 (IAM‑D)     | `python3 train.py --dataset CIFAR-100 --optimizer IAM --beta 10.0 --rho 0.1`  |
| Supervised CIFAR‑10  (IAM‑S)     | `python3 train2.py --dataset CIFAR-10  --rho 0.1` |
| Supervised CIFAR‑100 (IAM‑S)     | `python3 train2.py --dataset CIFAR-100 --rho 0.5` |
| semi-supervised CIFAR‑10 (IAM‑D) | `python3 train.py --optimizer IAM --beta 1.0 --rho 0.1 --semi True` |
| Self‑supervised SimCLR (IAM‑D)  | `python3 simclr.py` |
| toy example | run toy.ipynb|


## 4 Additional details and figures

### 4.1 role of $K$ in estimating local inconsistency S_ρ(θ)
<p align="center">
  <img src="./figures/K_vs_LI.png" alt="Loss landscape with and without SAM" width="512"/>  
</p>

<p align="center">
  <sub><em>Number of step K and estimated local inconsistency with algorithm 1 and Projected Gradient Ascent (PGA). </em></sub>
</p>

Algorithm 1 can approximate local inconsistency in a few step. And, in K > 1, algorithm 1 solve maximization problem in constraintion  $ \|\delta\| \le \rho$ better than PGA.

So $K=1$ can offer efficient approximation of local inconsistency, as mentioned in Appendix C. 

### 4.2 Detail experimaltal setting for figure 1 in section 4.6.
|hyper parameters| 6CNN            | WRN28-2       |
|-----           | ----------------| --------------|
|Data set | CIFAR-10 | CIFAR-10 |
|training data size| 45K | 45K |
|initial learning rate|{0.001, 0.002, 0.005, 0.01, 0.02, 0.05}|  {0.1, 0.03, 0.01} |
|batch_size| {32, 64, 128, 256, 512} | {32, 64, 128, 256, 512}|
|weight_decay|{0.0, 1e-4, 5e-4, 1e-3}| {0.0, 1e-4, 5e-4} |
|learning rate scheduling|  constant | {Cosine anneling, multi step lr}|
|data augmentation| False          | {True, False}|
|label smoothing | - | -
|epochs| until converge < 400      |    {150, 200, 300}|
|$K$| 3      |    1|

we trained 6CNN and WRN28-2 by SGD to investigate relation of generalization gap and local inconsistency. 6CNN are trained with 5 seeds per each trial hyperparameter combination to confirm diversity. 
$\mathrm{Tr}(H)$ and $\lambda_{\max}(H)$ are computed from train data with size 2000, and $S_\rho$ was computed on 5k unlabeld held out set.

Note that after training, a few models with high train error were excluded. In 6CNN, local inconsistency $S_\rho(\theta)$ with $K = 1$ and $K = 3$ shows high correlations. Thus, $S_\rho(\theta)$ with K=1 also shows $\tau \approx 0.5$ 

### 4.3 Hyper parameter sensitivity in IAM-D and IAM-S
For image classification task, $\beta, \rho $ are tuned via grid search over $\beta \in \{0.1, 1.0, 5.0, 10.0, 20.0\}, \rho \in \{0.01, 0.05, 0.1, 0.5, 1.0\}$ with validation split using 10% of the training dataset.
As seen in Figure, the best pairs are $(1.0,0.1)$ for CIFAR-10 and $(10.0,0.1)$ for CIFAR-100. For both datasets, $\beta$ and $\rho$ had trade-off relation. Furthermore, IAM seems to be more sensitive to $\rho$ instead of $\beta$.

<p float="left">
  <img src="./figures/heatmap10.png" alt="CIFAR-10_Heatmap" width=
  "45%" />
  <img src="./figures/heatmap100.png" alt="CIFAR-100_Heatmap" width="45%" />
</p>


Test error of IAM-S for CIFAR-100 with $\rho$ and SGD.  
<p align="center">
  <img src="./figures/IAM-S_rho.png" alt="Loss landscape with and without SAM" width="512"/>  
</p>

$\rho =0.5$ shows least test error for IAM-S trained by CIFAR-100. 

---

## 4  Third‑party code & licenses

| File(s)                                                 | Origin                                                                                                                                                                                     | License     |
| ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------- |
| `src/model.py`, `src/data.py` | Adapted from *Sharpness‑Aware Minimization* reference implementation by **Davide Borra** — [https://github.com/davda54/sam](https://github.com/davda54/sam) | MIT License |

All adapted sections retain the original MIT license header.  The full license text is provided in **`LICENSE.third_party`** as required by NeurIPS policy.

---
© 2025 Anonymous Author(s)  — released under the MIT License.
