# Quantifying and Optimizing Simplicity via Polynomial Representations

![Method Overview](assets/overview_white.png)  
> **Figure 1:** Overview of the method. We approximate the neural network's behavior along data-dependent interpolation paths using polynomial surrogates. The *Effective Degree (ED)* of these polynomials serves as a measure of functional complexity.

This repository contains the official implementation of the paper:  
**Quantifying and Optimizing Simplicity via Polynomial Representations**.

**Authors:** *Anonymous Authors* (Under Review)


## Installation

Clone the repository and install dependencies:
For rl experiments:
```bash
# Install dependencies
conda env create -f environment_rl.yaml
```
For other experiments:
```bash
# Install dependencies
conda env create -f environment.yaml
```

---

## Correlation Experiments

We provide scripts to analyze the correlation between Effective Degree (ED) and the generalization gap on both **CIFAR-10** and **ImageNet**.

### CIFAR-10 Experiments

#### 1. Generate Model Pool

First, train a diverse set of models with varying hyperparameters to establish the model pool.

- **For ResNet:**

```bash
bash corr_resnet.sh
```

- **For ViT-Tiny:**

```bash
bash corr_vit_tiny.sh
```

#### 2. Batch Evaluate ED

Once the model pool is generated, navigate to the `poly` directory and run the evaluation script to batch-evaluate the Effective Degree (ED):

```bash
cd poly
# Batch evaluation of the Effective Degree
bash eval_abd.sh
```

### ImageNet Experiments

To analyze the correlation on ImageNet (using Model Soups), navigate to the `model-soups` directory.

> **Note:** You may need to download the corresponding model weights from the [model-soups repository](https://github.com/mlfoundations/model-soups.git) before running these experiments.

```bash
cd model-soups

# Evaluate Generalization Gap
bash run_gg.sh

# Evaluate Effective Degree (ED)
bash run_wd.sh
```

---

## Grokking Experiments

We also provide scripts to study how the Effective Degree (ED) evolves and tracks validation loss during **grokking**.

To reproduce the phenomenon that ED closely tracks the validation loss during grokking on modular division:

1. Navigate to the `grokking/scripts` directory:

```bash
cd grokking/scripts
```

2. Run the following command:

```bash
WANDB_NAME="div_p97_frac0.3_pair_200_sharpness_every_100" CUDA_VISIBLE_DEVICES=5 \
python train_grokk.py dataset=mod_division_dataset dataset.p=97 dataset.frac_train=0.3 \
wd.use_pca=false wd.num_pairs=200 train.eval_every=100
```

This setup reproduces the behavior where the Effective Degree (ED) closely tracks the validation loss throughout the grokking process.

---

## ED Regularization Experiments

We provide scripts to reproduce the ED regularization results from the paper.

### CIFAR-10 (ViT-Tiny)

For training details, please refer to the [`run.sh`](run.sh) script and [`train_wd_regular_torch.py`](train_wd_regular_torch.py).

```bash
python train_wd_regular_torch.py \
    --net "vit_tiny" \
    --set_seed 0 \
    --save_net "vit_tiny_ED_reg_pair256_resolution15_max_degree7_lambda2.0" \
    --plot_path "./images/wd_reg" \
    --use_data_aug \
    --lr 0.005 \
    --lambda_reg 2.0 \
    --opt "AdamW" \
    --epochs 300 \
    --resolution 15 \
    --max_degree 7 \
    --nums_pairs 256 \
    --label \
    --warmup_epochs_for_lambda 100

```

#### Hyperparameters

| Argument | Description |
| --- | --- |
| `--lambda_reg` | Strength of the Effective Degree (ED) regularization. |
| `--nums_pairs` | Number of point pairs sampled within each batch. |
| `--resolution` | Number of interpolation points along the path between each pair (includes endpoints). |
| `--max_degree` | Maximum degree of the fitting function. |
| `--label` | Enable the **Label-Anchored** strategy. |
| `--pca k` | (Optional) Reduces output dimensionality to  dimensions. |
| `--random_alpha` | (Optional) Enables the **Randomized Cosine Sampling** strategy. |

For a complete list of available hyperparameters and their default values, please consult [`options.py`](options.py).

---

### ImageNet from scratch (ViT-S/16)


For ImageNet from scratch, please refer to the script commands provided in [`imagenet/imagenet_origin.sh`](imagenet/imagenet_origin.sh) for the original setting. For the stronger setting, please refer to [`imagenet/imagenet_strongbaseline.sh`](imagenet/imagenet_strongbaseline.sh).

### ImageNet Fine-tuning (CLIP)


For ImageNet fine-tuning with CLIP, please refer to the script commands provided in [`wise-ft/run.sh`](wise-ft/run.sh).


### NLP (GLUE Benchmark)

For text, please refer to the script commands provided in [`bert/run_glue_reg.sh`](bert/run_glue_reg.sh) with the ED regularization. For mixup setting, please refer to [`bert/run_glue_mixup.sh`](bert/run_glue_mixup.sh).


### RL (Procgen)

For reinforcement learning experiments on Procgen, please refer to the script commands provided in [`rl/ppo_procgen.sh`](rl/ppo_procgen.sh).

### Hyperparameters

For most tasks, we recommend the configuration that balances performance and speed:

- Sampling Resolution ($r$): **4**
- Max Degree ($d_{\max}$): **3**
- Sampled Pairs ($n_p$): **Batch Size / 2**

---