We have uploaded the core modeling code for the GPT2, Pythia, and LLaMA architectures used in our papers. These implementations can be easily integrated with the Hugging Face Transformers library.

- **Pretraining**: We use Transformers together with DeepSpeed for model pretraining.
- **Evaluation**: We use lm-evaluation-harness for model evaluation.

We will open source all our code and checkpoints in the camera-ready version.

