# Fast Language Generation through Discrete Diffusion Divergence Instruct


### `environment.yml`

```yaml
name: duo
channels:
  - nvidia/label/cuda-12.4.0
  - conda-forge
dependencies:
  - python=3.12.11
  - pip
  - cuda-toolkit=12.4.0
  - pip:
    - transformers==4.38.2
    - datasets==2.15.0
    - torch==2.3.1
    - torchvision==0.18.1
    - torchaudio==2.3.1
    - flash-attn==2.7.4.post1
    - einops==0.7.0
    - wandb==0.21.0
    - tqdm==4.67.1
    - lightning==2.2.1
    - triton==2.2.0
```

-----


## 🏗️ Usage Guide

### 1. Create and Activate the Conda Environment

Before first use, create and activate the environment from the provided `environment.yml`:

```bash
conda env create -f environment.yml
conda activate mask_model
```

### 2. Train a Small Model

Before distillation, train a baseline small model using one of the predefined scripts. For example, with the OpenWebText dataset:

```bash
# Train a small model on OpenWebText
source ./script/train_small_owt_mdlm.sh
```

### 3. Distill the Model

  ```bash
  source ./script/distill_openwebtext.sh
  ```

### 4. Evaluate the Distilled Models

  ```bash
  source ./script/eval_openwebtext.sh
  ```

### 5. Zero-Shot Evaluation

```bash
source ./script/zero_shot.sh
```


## 📚 References

This repository is built upon:
-   **DUO**: ["Diffusion Duality: Curriculum and Consistency for Discrete Diffusion LLMs"](https://arxiv.org/abs/2506.10892). ICLR 2025.
---

