This file contains necessary information to repruduce the experiments in the paper.

# Environment

- We used PyTorch 1.9.1 with CUDA 11.3, but other recent versions work just as well.
- Install additional packages by
> pip install persim git+https://github.com/shizuo-kaji/CubicalRipser_3dim

- For logging and visualisation of the results, we used Tensorboard.
- Random seeds can be specified by, e.g.,  -s 42 in the argument.

# Evaluation
Download CIFAR100 and Animal datasets by

    > python ImageDatasetsDownloader.py --dataset CIFAR100
    > python ImageDatasetsDownloader.py --dataset animal

Finetuning with the pretrained weights is performed by

- for CIFAR100
> python main.py -lm finetuning -t data/CIFAR100/train/ -e 90 -pw result/XX/resnet50_pt_epoch90.pth

- for Animal
> python main.py -lm finetuning -t data/animal/train/ -e 300 -pw result/XX/resnet50_pt_epoch90.pth

where XX is the directory name generated during the pretraining detailed below.
If XX is set to "imagenet", the ImageNet pretrained model that comes with the torchvision library is used.
If "-pw" is not specified, the model is trained from the scratch, yielding the result for "Scratch" in the paper.


# for Table 1

> python main.py -lm pretraining -t2 generate -c cache -lt xxx -n 400000

where xxx is one of
- persistence_image (for PH-PI)
- persistence_histogram (for PH-HS)
- persistence_betticurve (for PH-BC)
- persistence_landscape (for PH-LS)

This produces a pretrained weight file under the directory "result", which is used in #Evaluation.
Also, a TensorBoard log is obtained in a file named "events.out.*", which can be loaded into TensorBoard
to produce graphs in Figures 5 and 6.

The Label models are pretrained by 

> python main.py -lm pretraining -t2 xxx -lt class -e yyy

where xxx is chosen from {data/CIFAR100/train/, data/animal/train/}
and yyy is set to {90, 300} respectively.
The class labels (defined by the directory names) are used.


Alternatively, pretraining and finetuning can be performed sequencially by

> python main.py -t data/CIFAR100/train/ -t2 generate -c cache -n 400000 -lm combined -lt persistence_image


# for Table 2

For variable dimensions, 

> python main.py -lm pretraining -t2 generate -c cache -n 200000 -nd xxx

where xxx indicates the output dimension and is chosen from {100,200,400,800}.


For variable datasizes,

> python main.py -lm pretraining -t2 generate -c cache -n xxx

where xxx indicates the number of generated images for training and is chosen from {50000,200000,400000,800000}. 


For using real datasets for pretraining with PH, 

> python main.py -lm pretraining -t2 xxx -c cache -e yyy

where xxx indicates the dataset used for pretraining and chosen from {data/CIFAR100/train/, data/animal/train/)
and yyy is set to {90, 300} respectively.



