
# Towards Stable Learning in Predictive Coding Networks

repository for Towards Stable Learning in Predictive Coding Networks 
we analyzed instability of predictive coding network and propose method to stabilize it  

![figures/fig4.pdf](figures/fig4.pdf)
  
  
## project structure  

* main_stab.py: starting file to run the code  
* anal: core folder to run experiments and 
* plot: plot results  
    * main_stab.py -> anal/base.py -> anal/pc.py  
* model: folder to get models  
    * SPCN: pc_dense.py, PCL: pc_lbyl.py, PCN: pc.py  
* train: train related utility folder
* tool: utility folder  

### analysis related

- base.py  
    - main file to run analysis experiment or train&eval model  
- pc.py  
    - run predictive coding network -> pc.py, pc_dense.py, pc_lbyl.py  
- util.py, log.py, debug.py, reg.py  
    - utility file
- pmap_th.py
    - compute numerical value of length p and q  

### plot related

- maps.py  
    - plot line plots of various x-axis & legend, bar plots, scatter plots, and heatmap
- util.py, const.py  
    - utility file

### data related  
  
- dl_getter.py: get dataset and load dataset (MNIST, SVHN, CIFAR10, CIFAR100)
- anal_data.py: get random Gaussian data or data generated from randomly initialized neural network  

### model related  

- model_getter.py, model_io.py  
    - get models from folder `models`  
        - refer to the folder for detailed understanding   
- pc.py  
    - implementation of conventional PCN. all models use LocPC class to get local module.  
- pc_lbyl.py  
    - implementation of PCL (sequential training only)
- pc_dense.py  
    - implementation of PCD (SPCN)

  
## how to run experiments
  
### length analysis

- MOVE TO `length_anlysis` FOLDER AND RUN THESE COMMANDS.

* fig 2

command
````python
main_stab.py --dataset random --T 500 --n_layers 30 --latent_dim 100 --min_val_e 0.05 --bsz 128 --method pc --loss_sum --act linear --pos --n_conds_e 1 --n_runs 5 --z_init gaussian --orthogonal_testing --theory --sigma_b 0.0 --exp fig2
````
````python 
main_stab.py --dataset random --T 500 --n_layers 30 --latent_dim 100 --min_val_e 0.05 --bsz 128 --method pc --loss_sum --act linear --pos --n_conds_e 1 --n_runs 5 --z_init gaussian --orthogonal_testing --theory --sigma_b 0.0 --exp fig2_panel_e_iter_sb
````
````python 
main_stab.py --dataset random --T 500 --n_layers 30 --latent_dim 100 --min_val_e 0.05 --bsz 128 --method pcd --w_reg --z_reg --b_reg --loss_sum --act linear --pos --n_conds_e 1 --n_runs 5 --z_init gaussian --orthogonal_testing --theory --sigma_b 0.0 --exp fig2_spcn
````
````python 
main_stab.py --dataset random --T 500 --n_layers 30 --latent_dim 100 --min_val_e 0.05 --bsz 128 --method pcd --w_reg --z_reg --b_reg --loss_sum --act linear --pos --n_conds_e 1 --n_runs 5 --z_init gaussian --orthogonal_testing --theory --sigma_b 0.0 --exp fig2_panel_m_iter_sb
````

- appendix fig 11 result will also be plotted
- appendix fig 14, change `--act linear` to other activation functions (tanh, relu, silu, selu) and delete `--orthogonal_testing -- theory`

corresponding figure  
  
![figures/fig2.pdf](figures/fig2.pdf)
  
* fig 3  
  
command
````python 
main_stab.py --dataset random --T 500 --n_layers 30 --latent_dim 100 --min_val_e 0.05 --min_val_sw 1.0 --n_conds 1 --bsz 128 --method pc --loss_sum --act linear --pos --n_conds_e 1 --n_runs 5 --z_init gaussian --sigma_b 0.1 --exp fig3
````

- appendix fig 5, change `--z_init gaussian` into `--z_init ff`  
- appendix fig 12 result will also be plotted  
  
corresponding figure  
  
![figures/fig3.pdf](figures/fig3.pdf)

* appendix fig 6

command
````python 
main_stab.py --train --dataset mnist --T 20 --step_eta --latent_dim 100 --bsz 128 --last_cls --act tanh --loss_sum --min_val_sw 5.4 --sigma_b 0.1 --min_val_e 0.05 --n_conds 1 --n_conds_e 1 --n_layers 4 --method pc --epochs 1 --n_runs 5 --z_init use_db --arch fc --exp fig6 --len_all
````  
````python 
main_stab.py --train --dataset mnist --T 20 --step_eta --latent_dim 100 --bsz 128 --last_cls --act tanh --loss_sum --min_val_sw 5.4 --sigma_b 0.1 --min_val_e 0.05 --n_conds 1 --n_conds_e 1 --n_layers 4 --method pcd --b_reg --z_reg --w_reg --epochs 1 --n_runs 5 --z_init use_db --arch fc --exp fig6 --len_all
````

-- change `--T 10` into `--T 20`, `--T 50` 

corresponding figure  

![figures/appendix-earlystop.pdf](figures/appendix-earlystop.pdf)

* appendix fig 7

command
````python
python main_stab.py --data_opt random --n_layers 30 --act linear --eta 0.2 --T 5 --min_val_s 0.1 --step_val_s 0.1 --n_conds 20 --loss_sum --plot_term_exp --method pc
````  
  
corresponding figure  


### training task

- MOVE TO `training_spcn` FOLDER AND RUN THESE COMMANDS.

command
````python
run.py
````

* you have to change the argument INSIDE `run.py`.

* experimental options
    * line 21: $\sigma_w$
        * explosion experiment: [0.1, 0.2, 0.5, 1.0, 2.0, 5.0, 10.0]
        * layer depth experiment: [1.0]
    * line 22: $\sigma_b$
    * line 23: inference step T
        * explosion experiment: [500]
        * layer depth experiment: [10]
    * line 24: latent initialization method (gaussian, memorized gaussian, feedforward)
        * explosion experiment: ['gaussian']
        * layer depth experiment
            * MNIST: ['use_db']
            * CIFAR10: ['ff']
    * line 25: layer depth L
        * explosion experiment: [30]
        * layer depth experiment: [3,4,6,9,13]
    * line 26: method list (spcn, spcn-r, pcn+r, pcn)
        * for ablation, insert spcn-r, pcn+r
    * line 27: dataset & architecture combination (('random', 'fc'), ('mnist', 'fc'), ('cifar10', 'cnn'))
    * line 33: boolean for training
        * explosion experiment: False
        * layer depth experiment: True
    * line 79: boolean for lyapunov
        * explosion experiment: True
        * layer depth expeirment: False

* things to customize
    * set N_GPU in line 29
    * set wandb_entity in line 32
  
  
### arguments explanation  

* method: which method you want to run (pc, pcd)  
* dataset: which dataset you want to run (for training -> mnist, for length analysis -> random)  
* T: inference steps to update latent  
* eta: inference learning rate of latent  
* step_eta: whether to schedule eta linearly
* comp_eta: whether to schedule eta exponentially, by comparing to the previous loss
* latent_dim: latent dimension  
* n_layers: number of layers  
* bsz: batch size  
* act: which activation function to use (for training -> tanh, for length analysis -> linear)  
* b_reg, z_reg, w_reg: whether to use normalization / regularization  
* min_val_sw: starting value of sigma_w during experiment
* step_exp: exponentially increase value of sigma_w condition (true for length analysis)  
* n_conds: number of conditions to run on  
* n_runs: number of runs with varying seed  
* theory: compute and plot theoretical results along with experimental results
* heatmap: whether to plot heatmap