This repository contains the source code implementation for the paper "MQSP: Micro-Query Sequence Parallelism for Linearly Scaling Long Sequence Transformer"

## Directory Structure

### `distributed_transformer`

The core code of MQSP, which implements Micro-Query Sequence Parallelism in Pytorch.

### `MQSP_evaluation`

Train scripts, datasets related file, and data processing of sequence parallelism.

## Setup

### Software Dependencies

To run MQSP, you will need a NVIDIA GPU with CUDA 11.4, GPU driver version 470.82.01,
and Python 3.8.13, on a Linux server with NVIDIA A100-SXM4-80GB GPU(s).

Related dependencies are in 

```bash
sh init_env.sh
```

### Data Prepare

Please refer to [Hugging Face Model Hub](https://huggingface.co/models) for related models and datasets

```bash
export model_name_or_path=/path/to/model
export datasets_path=/path/to/datasets
```

### Run experiments

The example of experiment scripts is

```bash
python -m torch.distributed.launch  --nproc_per_node=4 --nnodes=1 --node_rank=0 \
    --datasets_path=$datasets_path \
    --model_name_or_path=$model_name_or_path \
    --sp_method=qasp_overlap \
    --per_device_train_batch_size=160 \
    --per_device_eval_batch_size=160 \
    --max_seq_length=512 \
    --num-epoch 3 \
    --seed 42 
```

For more information

```bash
python MQSP_evaluation/train_sp.py --help
```
