We strongly recommend running this code in the [cuml docker image](https://docs.rapids.ai/install#selector) provided by rapidsai.

Pip packages are provided in the requirements.txt file.

We provide a database that saves all attention patterns generated by each head in a separate h5py dataset.
You can query the database by changing the selectors (marked in the notebook) from "*" to a given set of keys you want to select.

run.ipynb contains a notebook that runs the entire pipeline from start to finish. It is important to note that running with the settings from the paper requires 125TB of storage, the bottleneck is also disk write speed, and GPU -> disk transfer rates. This is mainly due to saving all attention patterns, so we can evaluate the clusters visualy. Use the generate_and_encode to reduce this to 1TB disk usage, however without the ability to visualize the patterns.

All code can run on a single 80GB a100, or a 32GB v100 card if running just for the 3B and 7B models.