# On the Role of Batch Size in Stochastic Conditional Gradient Methods
Code used to replicate results for the paper above. Code is structured in the following way:
- configs: Configurations used for replicating the studies
- train_*: Training scripts used to train the 
We are using as our inspiration Repo from [Pethink et al., 2025](https://arxiv.org/abs/2502.07529).
We trained on 4xH200 GPUs and the configs are designed for these settings.

## Setup
Follow the installation of `requirements.txt`, be careful with the `torch` version being used.
Due to computational resources we have two versions of the training, one in which we can do a full run uninterrupted, and for long rungs we use `*ckpt` version.

## Example run
1. Scion 
```bash
torchrun --standalone --nproc_per_node=4 train_gpt_scion_ckpts.py \
    --config=<CONFIG_PATH> \
    --save_step=<SAVE_EVERY_STEP> \
    --ckpt_in=<RESUME_CKPT_PATH> \
    --ckpt_out=<WHERE_TO_SAVE>
```
2. Restarted Scion
```bash
torchrun --standalone --nproc_per_node=4 train_gpt_rescion_ckpts_2.py \
    --config=<CONFIG_PATH> \
    --resume=<RESUME_CKPT_PATH> \
    --ckpt_out=<WHERE_TO_SAVE> \
    --save_every=<SAVE_EVERY_STEP>
```

`Note: Be sure to be logged into your W&B account before training, otherwise logging will not occur.`