# Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

See `code/README.md` for more information.


