## 
### Generate Topic Matrix for each topic model
Before running anything else, one needs to first run `generate_topic_mat.py` to generate the topic matrix for all four topics models we experiment with. No argument needed to run it. 

### Synthetic document experiments
The folders `src_pure`, `src_LDA`, `src_CTM`, `src_PAM` are for pure, LDA, CTM, and PAM synthetic experiments, respectively. In each folder, run `sim_example.py` to start synthetic experiment. More specifically, one can run it by: 

`python3 sim_example.py [number of test documents] [number of hidden words per training document] [neural network hidden dimension] [number of layers in neural network] [number of training epochs]` 

For example, `python3 sim_example.py 200 6 768 8 100` specifies that a model with 768 hidden dimension and 8 layers will be trained for 100 epochs, and each training document has 6 "target words", and the model's performance will be tested on 200 test documents

Final model weights will be saved in the `savedmodels` directory for CTM and PAM. Afterwards, one can run `metrics.py` in CTM and PAM folder to measure the major topics recovery rate. The code is run as follows:

`python3 metrics.py [alpha] [model weights file name (as in savedmodels directory)]`

### Statistical Inference assuming a specific topic prior
The code under the folder `stats_inference` runs posterior inference assuming topic priors that we compare our approach against in our paper. The code can be run as follows: 

`python3 run.py [model type]`

where model type is one of "pure", "lda", "ctm", "pam"

### Visualization
The code for visualizing our results in section 5 and appendix section C can be found under the `visualization_sec5` folder. 

