## Running Python Scripts Directly

If you want to run the training scripts directly (without SLURM or the shell scripts), follow these steps:

1. **Create and activate a Python virtual environment** (recommended):
  ```bash
  python3 -m venv venv
  source venv/bin/activate
  pip install --upgrade pip
  pip install -r requirements.txt  # (if available, or install needed packages manually)
  ```

2. **Run the desired script with your chosen arguments**. For example:
  ```bash
  python3 train_deltanet.py --data_dir data/mm_T150 --alphabet pm1 --cuda --amp --amp_dtype bf16 --layers 2 --d_model 256 --heads 8 --dropout 0.1 --lr 3e-4 --weight_decay 0.001 --batch_size 256 --num_workers 4 --save_path ckpt_deltanet_rowtf_step1_clean.pt
  ```
  Adjust the arguments as needed for your experiment.

3. **Notes:**
  - You must install all required Python packages (see the shell scripts for dependencies).
  - GPU training requires a compatible CUDA environment and the correct version of PyTorch.
  - You can set environment variables before running the script to override defaults, e.g.:
    ```bash
    D_MODEL=512 python3 train_transformer.py ...
    ```
  - For best reproducibility, use the shell scripts or SLURM jobs as described above.

# Matrix Multiplication Model Training Suite

This repository provides scripts and utilities for training and benchmarking various neural network architectures (DeltaNet, Transformer, RNN, RWKV, Mamba) on matrix multiplication tasks. All training is managed via shell scripts and SLURM for reproducibility and ease of use on clusters.

## General Usage

1. **Prepare your data**: Place your dataset in the `data/` directory. Each script expects a specific data folder (see script or adjust `DATA_DIR` variable).

2. **Edit hyperparameters**: You can override most training parameters by setting environment variables before running the script, or by editing the script directly.

3. **Run the desired model script**:

### DeltaNet
```bash
sbatch delta.sh
```

### Transformer
```bash
sbatch transformer.sh
```

### RNN
```bash
sbatch rnn.sh
```

### RWKV
```bash
sbatch rwkv.sh
```

### Mamba
```bash
sbatch mamba.sh
```

Each script will set up a virtual environment, install dependencies, and launch training with the specified configuration. Logs and checkpoints will be saved as defined in each script.

## Customization

- To change hyperparameters, either edit the shell script or set environment variables, e.g.:
  ```bash
  D_MODEL=512 BATCH_SIZE=128 sbatch delta.sh
  ```
- To use a different dataset, set `DATA_DIR`:
  ```bash
  DATA_DIR=$PWD/data/my_dataset sbatch transformer.sh
  ```

## Requirements

- SLURM cluster with GPU nodes
- Python 3.8+

## Notes

- All scripts are self-contained and will create their own virtual environments in `/tmp` by default.
- For troubleshooting, check the log files specified in each script (e.g., `delta.log`).
