The code is quite messy as it wasn't designed for presentation: there are a lot of redundant blocks. However, here is a general guide:

compare_overlap_within_and_without.py, rwkv_mt5_number_comparison, and the interpret scripts focus on comparing map based metrics between models.
the proximal_forgetfulness scripts are main training loops
create_figures.py creates some figures with measured data
project_classes.py contains many classes and helpers
mt5_tests.py focuses on more numeric robustness tests
the state dicts are named depending on how they were trained and can be used for various tests. As they are too large for the submission they are available upon request