# Almost Bayesian: Dynamics of SGD Through Singular Learning Theory
Code for the paper **Almost Bayesian: Dynamics of SGD Through Singular Learning Theory**. 

## Repository Structure
The `scripts` folder contains the scripts for running experiments. In particular, experiments are run by running the `main.py` file. This file can easily be modified to automate the experiments for different settings if desired. The training process can be found in `train.py` with different model architectures in `models.py`.

We note here as well that additional datasets are included, as well as the option to use adaptive optimizers like `AdamW`. While these optimizers fall out-of-scope for the main paper, they do display dynamics that fall within a more general framework (time and space fractional Fokker-Planck Equation) so we would like to encourage exploration of these ideas.

## Running Experiments
Assuming your wandb api key is set, and one has installed the specified requirements from `requirements.txt`, one can run experiments by running the `main.py` file. To adjust the parameters for an experiment, one can simply alter the relevant values in the file. These values are the `model_type` variable, the `dset_name` variable, and the dictionaries `model_params, dset_params, run_params, optimizer_params`.

Graphs from the paper can be generated using the `GraphGeneration.ipynb` notebook.

### Using New Models and Datasets
The structure of the repository is meant to be adaptable to allow for easy investigation of other models, datasets, and optimizers. In particular, one may use the `train_and_analyze` function for `train.py` for any Pytorch model by giving it two additional methods, `get_weights` and `flatten_weights`. The `train_and_analyze` function then operates like a simple training-testing function and simply requires one to specify the loss function, optimizer, train/test datasets, and the number of epochs (there are also other optional parameters one can set). This is meant to encourage the study the diffusive behaviour of different optimizers and model types beyond the regime specified in the paper.

