# Output-Domain focused Biasing (ODB) Implementation

This implements Output-Domain focused Biasing method with pytorch Multi-processing Distributed Data Parallel Training

ODB takes visual features from Vision Transformer (ViT).
And based on the following steps, 

 - First, learns intermediate latent object features in an unsupervised manner,
 - decouples their visual dependencies by assigning new independent embedding parameters,
 - it captures structured features optimized for the original classification task,
 - it integrates the structured features with the original visual features for final,
prediction
   
We provide codes for training ODB on three different ViT based backbones on diverse benchmarks.

## Dataset
- ImageNet
- Places365 
- iNaturalist2018

## Backbone

- ViT (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
- MAE (Masked Autoencoders Are Scalable Vision Learners)
- SWAG (Revisiting Weakly Supervised Pre-Training of Visual Perception Models)

## Environment

- Pytorch
- A100 GPUs * 8

## Train
To pre-train ViT-Base with multi-processing distributed training, run the following codes.
```bash
torchrun --nproc_per_node=4 train.py \
--amp \
--seed 123 \
--save \
--method timm_augreg_in21k_ft_in1k \
--encoder ViT \
--vit_size Base \
--transfer \
--freeze \
--num_nvit_layers 6 \
--target_layer 11 \
--object_size 2048 \
--dataset ImageNet \
--batch_size 128 \
--epochs 50 \
--lr_scheduler CosineAnnealingLR \
--opt Adam 
```
- We run the model in multi-GPUs using multi-processing distributed using Distributed Data Parallel (DDP)
- To use ImageNet21K pre-trained ViT for backbone, use 'timm_augreg_in21k_ft_in1k' for `--method`
- Use to `--transfer` and `--freeze` to load the pre-trained model weights and freeze them.
- Set the number of ODB layer with `--num_nvit_layers` 
- Set the visual feature layer of backbone with `--target_layer` 
- Set the number of latent object of ODB with `--object_size` 