This folder contains code for submission 9558.

python=3.10
pytorch==1.12.1 cudatoolkit=11.3

The experiments and analysis methods are introduced in detail in Appendices. Please check the manuscript PDF for definitions of terminologies.

Small and large toy models are in the folder "exp", where exp-10 and exp-10-3 are for small toy model experiments (different data exponent and weight decay), exp-15 is to study activation density on small toy models, and exp-17 is for large toy models.

We wrote our own AdamW in adamw.py in exp folder to enable negative weight decay introduced in the main text.

Analysis of open-sourced LLMs are in the "LLMs" folder, where we studied the norm distribution, overlap distribution, evaluation loss, and token frequencies.