# Is Grokking a Computational Glass Relaxation?

This repository is the official implementation of the paper :"Is Grokking a Computational Glass Relaxation?", which is under review for NeurIPS 2025. By framing grokking as computational glass relaxation, this work explains grokking from the perspective of Boltzmann entropy and proposes a physics-based grokking-resistant optimizer.

This repository includes two experiments of the paper:
* (1) using WLMD to sample the entropy landscape of modular arithmetic tasks.
* (2) using WanD optimizer to train in the modular addition task.

## Requirements

To install requirements:

```setup
pip install -r requirements.txt
```

## Training

To train the model(s) in the paper, run this command:

```train
bash training_WanD.sh
```
We preset a set of appropriate hyperparameters, which is close to the configuration used in Figure 3 in our paper. Since this is a toy optimizer, the results may not be ideal, we may improve it in the future.

The training results can be output by running the following command:

```read
python read_WanD.py
```

## Entropy sampling

Please run the following command for using WLMD to sample the entropy landscape:
```sampling
bash grokking_WLMD.sh
```

By default, the script uses 4 GPUs to run 8 processes. Please change it according to your own configuration. Please note that getting an accurate entropy landscape usually requires each process to run tens to hundreds of millions of steps, which will take several months. The default hyperparameter configuration in the program corresponds to the results in Figure 2(c) in the paper. Other entropy landscapes in the paper can be obtained by changing the corresponding parameters in the program. 

The entropy landscape can be plotted by running the following command:

```plot
python plot.py entropyXXM.npz
```
We give a sample in the "sample" folder, you can obtain the result of Figure 1(a) in the paper by executing the plot command.