This repository contains code for reproducing the experiments in our submission.

We have 2 different codebases in the submission.

1. blockllm-pretraining-and-glue-experiments - As the name suggests, helps in reproducing GLUE experiments and the pretraining results on C4 dataset.
2. block-llm-dev and the accompanying Llama-factory together can be used to reproduce the Large scale finetuning experiments.

Instructions on how to setup the GLUE experiments are in the blockllm-pretraining-and-glue-experiments directory.

## Instructions on how to setup the large scale finetuning experiments.

1. As mentioned before, we have two codebases that together will help us run the large scale finetuning experimetns. 
2. First install blockllm by navigating to the block-llm-dev directory and run `pip install -e .`. 
3. Then navigate to the Llama-Factory directory and install Llama-Factory using the command `pip install -e ".[torch,metrics]"`. 
4. We should now be able to run all the experiments by making invoking the corresponding config files in the `configs` directory.
5. Once Llama-Factory and blockllm are installed, training can be started by running the following command along with the desirable config file, like so:

```
llamafactory-cli train configs/blockllm.yaml
```
6. Llama-Factory will setup the training data (alpaca in our case), along with the entire training setup, including logging.
7. Results of the training will be available, including logs and checkpoints, as the model is being trained in the `outputs` directory.
