# ***Anti-Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances in Flat Directions - Source Code***

## **Calculation time and resources**

All code was run on an Nvidia Titan RTX. Runtimes:

| program | runtime |
| ----------- | ----------- |
| main.py | ~ 24h |
| comparison_with_replacement.py | ~ 24h |
| comparison_feng.py | < 1h |
| comparison_hyperparameters.py | ~ 72h |
| appendix_hessian_projection.py | ~ 24h |
| comparison_test_accuracy.py | ~ 3h |
| appendix_resnet.py | ~ 24h |
| appendix_commutativity.py	 | ~ 1min |

## **Requirements**

There are the following dependecies: -numpy, scipy, matplotlib, tensorflow
The conda environment file is provided to install the requirements:

```setup
conda env create -f environment.yml
```


## **Main text analysis**

To run the main analysis, run:

```main
python main.py
```

The script creates the plots from the main text as well as the plot of the Hessian eigenvalue density found in the appendix. To do this, it trains the LeNet as described in the main text. After the initial training schedule, the Hessian eigenvectors corresponding to the 5 thousand largest Hessian eigenvalues are approximated. Then the network is trained further for 20 epochs with the learning rate from the end of the initial training. During these 20 epochs, the weights, the mini-batch gradient and the full-batch gradient are projected onto the approximated Hessian eigenvectors at each update step and stored. From the recorded data, the noise term of each update step is calculated (full-batch  gradient - mini-batch gradient) and its averaged autocorrelation function is plotted with the theory prediction from the main text. Also, the weight variance and the velocity variance are plotted against the corresponding Hessian eigenvalue, as well as the correlation time with the theory prediction from the main text.

## **Comparison for drawing with replacement**

To run the comparison analysis, for training when drawing the examples with replacement, run (after you ran main.py):

```comparison_with_replacement
python comparison_with_replacement.py
```

This second script creates the remaining plots for the appendix section which compares to drawing the examples with replacement. It uses the trained network and the Hessian eigenvectors of the main.py script and than does a similar analysis, except that the examples are drawn with replacement during the recorded analysis period.


## **Comparison with Feng and Tu**

To run the analysis, which compares our analysis method with the method of Feng and Tu, run:

```comparison_feng
python comparison_feng.py
```

The third script generates the plots for the appendix section which compares our analysis with that of Feng and Tu. It performs an analysis similar to main.py, but the network has no L2 regularization and the analysis is done only for the weights of one layer to achieve similar conditions as Feng and Tu. No autocorrelation function is computed, instead the variances and the correlation time are analyzed not only in the Hessian eigenbasis of this one layer, but also in the eigenbasis of the weight covariance matrix from the recorded weight series. Furthermore, the script simulates an artificial stochastic gradient descent in an isotropic quadratic potential and evaluates the variances and correlation time once without any basis change and then int the eigenbasis of the weight covariance matrix.


## **Testing different Hyperparameters**

To run the analysis, which compares our analysis method for different hypeparameters, run:

```comparison_hyperparameters
python comparison_hyperparameters.py
```

This script generates the plots for the appendix section which compares our analysis method for different hypeparameters. It performs an analysis similar to main.py for different sets of hyperparameters and seeds and in the end create a plot where the empirically determined maximum correlation time and the empirically determined crossover value are compared to their theory predictions.


## **Test Accuracy for drawing with and without replacement**

To run the analysis regarding the improvment of the test accuracy for training without replacement, run:

```comparison_test_accuracy
python comparison_test_accuracy.py
```

This script generates the plots for the appendix section regarding the Test Accuracy and the Hessian Eigendecomposition of the Weight Vector. For that it approximates even more Hessian eigenvalues and eigenvectors of the LeNet Network.


## **Test Accuracy and the Hessian Eigendecomposition of the Weight Vector**

To run the analysis regarding the Hessian Eigendecomposition of the Weight Vector, run (after you ran main.py):

```appendix_hessian_projection
python appendix_hessian_projection.py
```

This script generates the plots for the appendix section regarding the Test Accuracy and the Hessian Eigendecomposition of the Weight Vector. For that it approximates even more Hessian eigenvalues and eigenvectors of the LeNet Network.


## **Different Network Architecture**

To run the analysis on the ResNet20, run:

```appendix_resnet
python appendix_resnet.py
```

This script generates the plots for the appendix section regarding the analysis of the ResNet20 network. It performs a similar analysis as main.py but uses the ResNet20 instead of the LeNet. It approximates significantly less eigenvalues due to the larger network size and leaves out the autocorrelation analysis.


## **Commutativity Assumption**

To run the analysis regarding the commutativity assumption, run (after you ran main.py):

```appendix_commutativity
appendix_commutativity.py
```

This script generates the plots for the appendix section regarding the commutativity assumption analysis.