# MUON OUTPERFORMS ADAM IN TAIL-END ASSOCIATIVE MEMORY LEARNING Code

## Code Structure

### `train_muon_svd.py`
- **Purpose**: Training code for experiments described in **Section 3.1** and **Section 3.2**
- **Dataset**: FineWeb. This dataset can be prepared from the pipeline in the "data" folder.


### `train_qatail.py`
- **Purpose**: Training code for experiments described in **Section 3.3**
- **Dataset**: Synthetic QA dataset. This dataset can be prepared following *Zeyuan Allen-Zhu and Yuanzhi Li. Physics of language models: Part 3.3, knowledge capacity scaling laws. arXiv preprint arXiv:2404.05405, 2024.* and Appendix F.3 in our manuscript.


