# ASDL: Automatic Second-order Differentiation Library
## Fork for the paper "K-FAC for Modern Neural Network Architectures"

The most important change is the option to choose between `'expand'` and `'reduce'` when using K-FAC for linear (`nn.Linear`) or convolutional (`nn.Conv2d`) layers.
The approximation can be specified by passing the `kfac_linear` and `kfac_conv` arguments to the `KfacGradientMaker`.

Please see the files for [linear modules](/asdl/operations/linear.py) and for [convolutional modules](/asdl/operations/conv.py) for the implementation of K-FAC-expand and K-FAC-reduce.
Notably, for the graph neural network the linear modules are expected to have an `n_graph` attribute (among others).

---

ASDL is an extension library of PyTorch to easily perform **gradient preconditioning** using **second-order information** (e.g., Hessian, Fisher information) for deep neural networks.

<p align="center">
  <img src="https://user-images.githubusercontent.com/7961228/207084513-d696f459-1b6e-48cb-b597-00ec6c4bffe2.png" width="400">
</p>

ASDL provides various implementations and **a unified interface** (GradientMaker) for gradient preconditioning for deep neural networks. For example, to train your model with gradient preconditioning by [K-FAC](https://arxiv.org/abs/1503.05671) algorithm, you can replace a `<Standard>` gradient calculation procedure (i.e., a forward pass followed by a backward pass) with one by `<ASDL>` with KfacGradientMaker like the following:

```python
from asdl.precondition import PreconditioningConfig, KfacGradientMaker

# Initialize model
model = Net()

# Initialize optimizer (SGD is recommended)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Initialize KfacGradientMaker
config = PreconditioningConfig(data_size=batch_size, damping=0.01)
gm = KfacGradientMaker(model, config)

# Training loop
for x, t in data_loader:
  optimizer.zero_grad()
  
  # <Standard> (gradient calculation)
  # y = model(x)
  # loss = loss_fn(y, t)
  # loss.backward()

  # <ASDL> ('preconditioned' gradient calculation)
  dummy_y = gm.setup_model_call(model, x)
  gm.setup_loss_call(loss_fn, dummy_y, t)
  y, loss = gm.forward_and_backward()

  optimizer.step()
```

You can apply a different gradient preconditioning algorithm by replacing `gm` with another `XXXGradientMaker(model, config)` (*XXX*: algorithm name, e.g., ShampooGradientMaker for [Shampoo](https://arxiv.org/abs/1802.09568) algorithm) **with the same interface**. 
This enables a *flexible switching/comparison* of a range of gradient preconditioning algorithms.

## Installation

You can install the latest version of ASDL by running:
```shell
$ pip install git+https://github.com/kazukiosawa/asdl
```

ASDL is tested with Python 3.7 and is compatible with PyTorch 1.13.

## Resource

- [ASDL paper](https://arxiv.org/abs/2305.04684)
- [ASDL poster](./ASDL_HOOML2022_poster.pdf) @ [HOOML2022 workshop](https://order-up-ml.github.io/) at NeurIPS 2022
