# Instructions

## Install Conda environment

```bash
conda env create -f environment.yaml
```


## Training


Here are several commands that use Slurm jobs, for instance, to train multilingual speech encoders on SIB-Fleurs or NLLB-LLM2Vec on Belebele-Fleurs.
The commands launch Hydra multirun jobs that queue Slurm jobs for all combinations of hyperparameters.

```bash
 python -m trident.run --multirun hydra/launcher=slurm experiment=sib run.seed=42,43,44 trainer.max_epochs=40 run.train_batch_size=4 trainer.accumulate_grad_batches=8 module.optimizer.lr=1e-5,2e-5,3e-5 +arch=seamlessm4tv2
 ```

```bash
env HYDRA_FULL_ERROR=1 python -m trident.run --multirun experiment=sib hydra/launcher=slurm hydra.launcher.time="01:30:00" run.seed=42,43,44 module.optimizer.lr=1e-04,2e-04,3e-04 trainer.accumulate_grad_batches=1 run.train_batch_size=32 +peft=lora model.pretrained_model_name_or_path=NLLB-LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-unsup-simcse +arch=text run.kind=best,worst run.text_column=text datasets=sib_val_test_text_eng
```

## Evaluation

Once a training run is complete `./extract_src_val_ckpts.py` can be used to generate a bash script that queues evaluation on Slurm  
