Our implementation is based on the released code of "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models", where we modify the class BlockOptimizer and add additional arguments to enable landscape correction.

To use `BREAD` and `BREAD-partial`, set `lora_mode` to `all` or `partial`. 

**NOTE**: Make sure the model is a peft model, i.e. the lora modules have already been added to the model. 

To use BREAD, wrap the original optimizer `original_optimizer` with the `BreadOptimizer`, which will automatically update the correction matrices and the active block.

```python
from bread import BreadOptimizer

# before training, add this line to wrap the original optimizer
optimizer = BreadOptimizer(
    base_optimizer=original_optimizer,
    named_parameters_list=list(model.named_parameters()),
    switch_block_every=100, # K in algorithm 1, i.e. iteration number for solving each sub-problem.
    lora_mode="all" # Options: (all, partial). "all" means updating all the correction matrices, i.e. the BREAD method. "partial" means only updating the correction matrices that are in the deeper layers than the active block, i.e. the BREAD-partial method.
)
```

To use `BREAD-SGD`, set `sgd_mode`. The model is not necessarily a peft model.

```python
from bread import BreadOptimizer

# before training, add this line to wrap the original optimizer
optimizer = BreadOptimizer(
    base_optimizer=original_optimizer,
    named_parameters_list=list(model.named_parameters()),
    switch_block_every=100, # K in algorithm 1, i.e. iteration number for solving each sub-problem.
    sgd_mode="all" # Options: (all, partial). "all" means updating all the correction matrices using SGD, i.e. the BREAD-SGD method. "partial" means only updating the correction matrices that are in the deeper layers than the active block, i.e. the BREAD-SGD-partial method.
)
```
