# Unified Sparse Mixture of Experts


## Prerequisite

- pytorch
- fastmoe: https://github.com/laekov/fastmoe
- transformer: https://github.com/huggingface/transformers

## Usage

##### Pretraining USMoE on enwik8: 

``` # Training from Scratchon enwik8 dataset: 
bash run_exp.sh
```
