# Quartet: Towards Optimal End-to-End FP4 Training for Large Language Models

## Quickstart 

Create a conda environment and install dependencies (we recommend Python 3.11):

```bash
conda create -n env python=3.11
conda activate env
```

Install the requirements (we recommend to install torch from specific channels and compile `fast_hadamard_transform` from source):

```bash
pip install -r requirements.txt
```

Run a pseudo-quantization e2e MXFP4 pre-training with:
```bash
bash main_setup.sh
```

The above command trains a 30M parameters model with the Llama-style architecture on 3B tokens. More training runs can be seen in `sweep_tokens_backward.sh`.

## Main Quantization schemes

 * `AlbertTsengQuantizer` is the MXFP4 rounding scheme proposed by Tseng et al. Quartet uses it with `g_quant_kwargs={stochastic=True, rerotate='signs'}` for `--g_quant`. Other hyper-parameters are used for ablation studies.
 * `QuestMXFP4Quantizer` is the MXFP4 adapted version of QuEST. Quartet uses it for `w_quant,a_quant`.



## MXFP4 Kernels

Please refer to the `quartet-kernel` subdirectory.
