# ****

***Following the double-blind policy, we delete the path name that includes the authors' name. Please manually set your own
data path.***

This project requires multiple python packages, including dgl, torch geometry, keras, and other common package for deep 
learning research.

1. use prepare_data/dataset_utils.py to downsample the dataset, and generate features and pairwise distance data
2. use prepare_data/bo to search the optimal Z* for tsne and umap
3. use gnn/LargeCompleteMVGraphDatasets.py object to calculate PE, where there is 
"precompute" method.
4. set the path carefully. There are 4 paths, including 
cdist_path (path to the pairwise distance), visual_path (path to Z* from tSNE),
visual_path_umap (path to Z* from UMAP), and precomputed_pe_path (path to pre-computed PEs).
5. use gnn/train_gin_large_complete_g_mv*.py to train the AutoDV model

All raw datasets used in the project can be found in 
https://drive.google.com/drive/folders/1Cv15e5A5W5a9K8OZkAA-TVRURvMpdgbL?usp=sharing
This is a anonymous file sharing link. The data preprocessing code can be find in prepare_data folder. 
We make a clarification here for potential copyright problem. For image data, we access the raw data of 
MNIST, FMNIST, and CIFAR10 from pytorch interface https://docs.pytorch.org/vision/main/datasets.html.

For gene data, we download Campbell from https://github.com/perslab/campbell-2017. We download PBMC68k from
https://www.10xgenomics.com/datasets/fresh-68-k-pbm-cs-donor-a-1-standard-1-1-0
We download Mouse Retina from
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE201402
We download Baron Human from https://www.ncbi.nlm.nih.gov/gds/?term=GSE84133[Accession].

For tabular data from UCI library, we download the data by searching the name in https://archive.ics.uci.edu/, which is 
publicly accessible. 


Note that there are two modes of AutoDV, "parallel gt" and "none parallel gt". 
"parallel gt" means the graph transformers are accepting k-view graphs at very begining and process the input 
parallelly with the GINs. For AutoDV-UMAP, we use this mode.

To evaluate the model, codes can be find in gnn/transfer_exp or gnn/large_res. 
gnn/runing_time_exp provide running code for dummy data.

There may exist some irrelevant codes. They are codes in the early period of this project.

Aside from the main training code, some codes such as data preparation, results reading, and ploting,
may be not well writen. Some useful code may be comment out. Use them carefully.


