### Neural Deep Equilibrium Solvers 

> Code for submission to ICLR 2021 only.

#### 1. Where's the hypersolver code?

The hypersolver code (including HyperAnderson iterations and the initializer) is available in the `hypersolver/` directory. The implementation of the loss objectives for DEQ-Transformer and multiscale-DEQ are in `deq/models/transformers/deq_transformer.py` and `mdeq/lib/utils/utils.py`, respectively. 

#### 2. How to download data?

We provide links to download data in the appendix of the paper. Data procurement can also be found in the original DEQ or multiscale-DEQ repos.

#### 3. How to train?

In order to train, you need to download the pretrained DEQ models first. For WikiText-103 language modeling task, we downloaded the latest (by the DEQ authors' recommendation) version of the pretrained model from [here](https://github.com/locuslab/deq/tree/beta) (i.e., `beta` branch, not `master`). For ImageNet/Cityscapes, we simply downloaded the publicly released version at [this repo](https://github.com/locuslab/mdeq), and convert the state dict to a newer version. We provide a jupyter notebook for this conversion, in `state_dict_convert.ipynb`. 

For example, to train hypersolver on Cityscapes pretrained DEQ, you need to:
  - First, download the pretrained model `mdeq_SMALL_Cityscapes.pth` from  [this repo](https://github.com/locuslab/mdeq).
  - Open `state_dict_convert.ipynb` and convert this downloaded `.pth` file to the latest version. Use the code under "Convert Old Version DEQ to New Version DEQ" in the notebook". Put the pretrained model in `mdeq/pretrained_models/`.
  - Run: `python -m torch.distributed.launch --nproc_per_node=4 tools/seg_train_hyper.py --cfg experiments/cityscapes_hyper/seg_mdeq_SMALL_hyper.yaml`.

For ImageNet, we only train the hypersolver on 2% of the training data. Therefore, following similar steps for the pretrained model processing, you need to run: `python tools/cls_train_hyper.py --cfg experiments/imagenet_hyper/cls_mdeq_SMALL_hyper.yaml --percent 0.02`.

For WikiText-103, after downloading the data and the pretrained model (say `A.pth`), run `bash run_wt103_deq_transformer_hyper.sh train --load A.pth --learn_alpha --learn_beta --initializer`.

In all cases, we use 4 GPUs to train and 1 GPU to benchmark inference-time efficiency.