We use the **dolomite-engine** (forked) to run our language pretraining experiments. Author information of dolomite-engine are preserved.

In **greedtok**, we anonymize author information.
Running setup.py is similar to pip installation.

Most of the compression experiment results can be found in greedtok/eval_notebook. Due to size limitation, we cannot fit all of our data.

Refer to greedtok/eval_hf on how to use GreedTok.

