.. _config:

Training Models on Task Datasets (Commands and Configurations) 
#################################################################

LAVIS provides scripts to pre-train and finetune supported models on standard language-vision tasks, stored at ``lavis/run_scripts/``. 
To replicate the experiments, just run these bash scripts. For example, to train BLIP model on the image-text retrieval task with MSCOCO dataset, we can run

.. code-block::

    bash run_scripts/lavis/blip/train/train_retrieval_coco.sh

Inside the scripts, we can see 

.. code-block:: bash

    python -m torch.distributed.run --nproc_per_node=8 train.py --cfg-path lavis/projects/blip/train/retrieval_coco_ft.yaml

where we start a pytorch distributed training on 8 GPUs (you may change according to your own hardware setup). The ``--cfg-path`` specifys a `runtime configuration file`, specifying
the task, model, dataset and training recipes. 

Available options and their descriptions are as below.

.. LAVIS executes training and evaluation based on arguments specified in the configuration files. The default model and dataset configurations are defined in ``lavis/configs``. The task-specific configurations are defined in ``lavis/projects``. Task-specific configurations have higher priority over the default configurations.

.. The following tables provide explanations for the arguments in the configuration files.

.. list-table::
   :widths: 30 40
   :header-rows: 1

   * - Model Configurations
     - Functionalities
   * - arch
     - | name of the model from the model zoo
       | default: task-dependent
   * - model_type
     - | the type of the model (e.g., base)
       | default: task-dependent
   * - load_pretrained
     - | load pretrained weights
       | default: True (for finetuning task) | False (for pretraining task) 
   * - load_finetuned
     - | load task-specific finetuned weights
       | default: False (for finetuning task) | True (for evaluation) 
   * - pretrained 
     - | URL or local path which stores the pretrained model, defined in the default model configuration file
       | default: task-dependent 
   * - finetuned
     - | URL or local path which stores the finetuned model, defined in the default model configuration file
       | default: task-dependent

.. list-table::
   :widths: 30 50
   :header-rows: 1

   * - Dataset Configurations
     - Functionalities
   * - vis_processor
     - | pre-processing of visual input
       | default: task-dependent
   * - text_processor
     - | pre-processing of text input
       | default: task-dependent
   * - build_info
     - | dataset information including the storage location, defined in the default dataset configuration file
       | default: task-dependent

.. list-table::
   :widths: 30 50
   :header-rows: 1

   * - Runtime Configurations
     - Functionalities
   * - task
     - | name of the task
       | default: task-dependent
   * - lr_sched
     - | learning rate schedular
       | default: linear_warmup_cosine_lr
   * - init_lr
     - | initial learning rate (after warmup)
       | default: task-dependent
   * - min_lr
     - | final learning rate after decay
       | default: task-dependent
   * - warmup_lr
     - | starting learning rate for warmup
       | default: init_lr (no warmup)
   * - lr_decay_rate
     - | learning rate decay per epoch for step_lr_shedule
       | default: 0.9
   * - warmup_steps
     - | number of steps for learning rate warmup
       | default: 0
   * - max_epoch
     - | total number of training epochs
       | default: task-dependent
   * - weight_decay
     - | weight decay coefficient for the optimizer
       | default: 0.05
   * - batch_size_train
     - | batch size during training
       | default: task-dependent
   * - batch_size_eval
     - | batch size during evaluation
       | default: task-dependent
   * - seed
     - | pseudo random number generator seed
       | default: 42
   * - output_dir
     - | directory to store logs, results and checkpoints
       | default: task-dependent
   * - resume_ckpt_path
     - | path of the checkpoint to resume training from
       | default: None
   * - evaluate
     - | only perform evaluation without training
       | default: False
   * - train_splits
     - | dataset splits used for training
       | default: ["train"]
   * - valid_splits
     - | dataset splits used for validation
       | default: ["val"]
   * - test
     - | dataset splits used for test
       | default: ["test"]
   * - device
     - | use cpu or gpu (cuda)
       | default: cuda
   * - world_size
     - | number of processes participating in the job
       | default: 1
   * - dist_url
     - | URL specifying how to initialize the process group
       | default: "env://"
   * - distributed
     - | use distributed training
       | default: True
   * - amp
     - | use automatic mixed precision training
       | default: False

.. list-table::
   :widths: 40 50
   :header-rows: 1

   * - Text Generation Configurations
     - Functionalities
   * - max_len
     - | maximum number of text tokens to generate
       | default: 20 (for image captioning)
   * - min_len
     - | minimum number of text tokens to generate
       | default: 5 (for image captioning)
   * - num_beams
     - | number of beams to perform beam search
       | default: 3

.. list-table::
   :widths: 40 50
   :header-rows: 1

   * - Multimodal Retrieval Configurations
     - Functionalities
   * - negative_all_rank
     - | collect negatives from all processes for the image-text matching loss
       | default: True (for coco)
   * - k_test
     - | number of retrieval candidates ranked from contrastive similarity
       | default: 256 (for coco)
