
Example run script:
torchrun --nproc_per_node=# of proc --nnode=# of nodes --node_rank=current node rank main_launch.py --dataset imagenet --workers 4 --data-backend pytorch --model model name --num_classes 1000 --epochs # of epochs --start-epoch 0 --batch-size batchsize --accumulate_steps 1 --model_dir path to pretrained model --lr_initial initial lr --weight-decay weight decay --momentum 0.9 --pretrained --world-size world size --rank 0 --dist-backend NCCL --dist-url env:// --ngpus # of GPUs --multiprocessing-distributed --train_scale --imc --adc --mode analog --weight_bits 4 --input_bits 4 --adc_bits adc bit-precision --print-freq 50 --save_dir ./results

Please provide the following arguments for specific use cases:
1) distributed training parameters:
# of GPUs
world size
# of processes
# of nodes
current node rank
2) training parameters:
model name
# of training epochs
batch size
path to the pretrained model
initial learning rate
weight decay
adc bit-precision


Note: 
1. please alter _set_range in train/layer_base.py to switch between QAT phase and ADC phase
2. turn off imc flag for QAT phase

