# ICL Memorization relationship

Hypothesis: Strings that models can predict better using ICL and where memorization and ICL conflict for some tokens, will be harder to memorize, because ICL-based and memorization-based predictions conflict with each other.

- ICL measure: difference between the loss in the first and second half of the string
- For strings with different alphabet size, entropy and regularity (number of unique vs repeated tokens), measure ICL and also look at memorization speed. I.e. compute the first stable memorization epoch of the string. Is there a correlation?
- Look at the following strings:
    - Alphabet sizes: 2 - 26
    - Entropy levels: 2 - 26
    - Uniqueness: 16 token substrings, 2 - 64 unique substrings, i.e. 32 / 1024 - 1024 / 1024 unique tokens. Problem: there is no conflict here between ICL and memorization predictions
