# Soft Mixture of Experts

PyTorch implementation of Soft Mixture of Experts (Soft-MoE) from ["From Sparse to Soft Mixtures of Experts"](https://arxiv.org/abs/2308.00951v1).
This implementation extends the [`timm`](https://github.com/huggingface/pytorch-image-models) library's `VisionTransformer` class to support Soft-MoE MLP layers.

This implementation is based on the publicly available repository at https://github.com/bwconrad/soft-moe.


## Installation
```
pip install -r requirements.txt
pip install -e .
```

