The model and theory behind our implementations in this library are explained in our ICML paper "[Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals]". 
Instructions for running the experiments in our paper are given in the following files:

```
wikitext103/README.md

deit/README.md

segmenter/README.md
```