## 2.0.0
* Add gradient checkpointing, FullyShardedDataParallel
* Model releases 
    * (CLIP ViT-L-14 / MPT-1B)
    * (CLIP ViT-L-14 / MPT-1B Dolly)
    * (CLIP ViT-L-14 / RedPajama-3B)
    * (CLIP ViT-L-14 / RedPajama-3B Instruct)
    * (CLIP ViT-L-14 / MPT-7B)
* Remove color jitter when training
* Fix cross-attention bug when calling generate()

## 1.0.0

* Initial code release
* Early model release (CLIP ViT-L-14 / LLaMA-7B)