# Training LTM (Language Turing Machine)

## Prerequisites

* In the project root, register the `00_pytracify` directory as a GitHub repository named **`pytracify`**, and in both `uv.lock` and `pyproject.toml`, replace **`YourOrg`** with your own organization name.

## Installation

Before installing the dependencies, make sure to git clone the repo and git checkout `feat/training`.

```bash
# enter interactive session to obtain a gpu
srun --partition=a3 --gpus=1 --pty bash
cd training
uv sync
# make sure to login huggingface and wandb
uv run huggingface-cli login
uv run wandb login
```

If you enconter `undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12`, run:

```bash
export LD_LIBRARY_PATH=.venv/lib64/python3.11/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH
```


## How to train on CR-Bench

⚠️ Only our GCP clusters work correctly now. For the sakura cluster, some errors (around incompatibility of CUDA or something) raise and I haven't investigated it yet.

### Training

**Commandline:**

```bash
# Make sure to enter interactive session to obtain a node with 8 GPUs

CONFIG=configs/path/to/config
uv run main.py --base $CONFIG
# For debugging, here's the most common command:
# uv run main.py --base $CONFIG --debug --devices 0,
```

