# Language modeling

This repostory is based on the implementation of https://github.com/jdeschena/sdtt/ and its modification https://github.com/sony/di4c/tree/main/sdtt.

## Preparation
With the following command, install the SDTT codebase. It works with Python 3.10.

```bash
pip install torch==2.6.0 torchvision==0.21.0 --extra-index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install -e .
```

## Evaluation
We can evaluate each sampler as follows:

```bash
bash run.sh
```

It outputs Entropy and Generative Perplexity of 8- to 256-step sampling. Here, you need to adjust the variables in the file `run.sh` for using a specific sampler described in the paper, and also need to set `MOMENT_TEMP` to the Gumbel temperature you would like to use:

|Sampler Name|USE_MOMENT|SAMPLER|TEMP_SAMPLE|DIM_SELECTION|HALTON|SUBSTEP|
|-:|-|-|-|-|-|-|
|**Random**|true|moment|false|false|false|1|
|**MaskGIT**|true|maskgit|false|false|false|1|
|**Moment**|true|moment|true|true|false|1|
|**Temp**|true|moment|true|false|false|1|
|**Halton**|true|moment|false|true|true|1|
|**U-Moment**|true|moment|false|true|false|1|
|**Hybrid**|true|moment|false|true|true|1|
|**Random+Cache**|true|moment|false|false|false|2|
|**Hybrid+Cache**|true|moment|false|true|true|2|
|**Vanilla** (D.4.1)|false|vanilla|false|false|false|1|