# zoo-pruning

## Pipeline

In general, prune at initialization (ZO-GraSP) first, and then train the model with sparse-CGE based ZO method distributedly 

Training details:

1. calculate $f(\theta)$ (with intermediate features) in the main process

2. send intermediate features to subprocesses to calculate $\frac{f(\theta + \mu \cdot e_i) - f(\theta)}{\mu}$ distributedly

3. wait for the result from each subprocess to concatenate the whole gradients

4. update the network in the main process

## Code Structure

+ algorithm

  + prune: pruning at initialization algorithms, usage: `from algorithm.prune import global_prune`
  
  + zoo: zeroth order optimization algorithms
    
    + gradient_estimate.py: single process RGE and CGE
    
    + distributed_cge.py: distributed CGE related algorithm, implemented using `torch.distributed.rpc`
    
+ analysis: tools for conveniently getting the experiment results

+ data: getting data loaders and the class number information, usage: `from data import prepare_dataset`

+ experiments: main executable files

  + sparse_gradient_training.py: the distributed cge training. example usage: `python experiments/sparse_gradient_training.py --network resnet20 --dataset cifar10 --score layer_wise_random --sparsity 0.9 --sparsity-ckpt zo_grasp+_0.9 --gpus 0,1,2,3 --lr 0.1 --master-port 29500`
  Need to note: When multiple jobs are running, please assign different master port.
  
+ models:

  + distributed_model.py: defines how subprocess react to the signal of the main process
  
  + resnet_s.py: resnet small models with feature reuse
  
+ scripts

+ tools

+ cfg.py: path configs

## More Experiments

To be released.