# projectx_submission

A vcf file for the ADNI SNP data on the WGS is used as the input for genomic data. Directions are as follows: 
1. Under Download > Genetic Data on ADNI database (adni.loni.usc.edu) which can be accessed by logging into ida.loni.usc.edu
2. Make sure that you download plink, you can install it from conda and if you do so make sure that you have bioconda as an additional channel. You can do that with conda config --add channels bioconda. After turning on bioconda as a channel do conda install -c bioconda plink. Alternatively, you can download plink as a file here https://www.cog-genomics.org/plink/ and make sure that you have the associated files in the correct directory to work with. If you download manually, copy all files in plink zip directly into your working directory
3. Download corresponding text files: wholeattempt.txt (noncoding SNPs), codingsnps.txt (coding SNPs)
4. Afterwards, you can run the following command (of which has been commented in the very top of the df.py file): ./plink --bfile WGS_Omni25_BIN_wo_ConsentsIssues --extract wholeattempt.txt --recode vcf-iid --out ./new_vcf. (Note that if you are doing it with plink installed in conda then you just do plink instead of ./plink)

Pre-processing steps:

* Download UC Berkeley file from ADNI from study data section (UCBERKELEYAV1451)
* You will also need ROIs.csv
* First run z_score.py which generates rois_zscores.csv
* Then uses the computed z_scores from rois_zscores.csv as the input into obtain_subtypes.ipynb which ouputs subtypes.csv
* Then use subtypes.csv as the input to labels.py, which will generate patientLabels.csv
* This is a csv file that maps all the patients in snp and the genetic data with their corresponding label

Requirements.txt Usage:

	conda create --name <env> --file requirements.txt

Models: 

* random_forest.py: Random Forest model with train/test, heatmap, and cross-validation code.
* decision_tree.py: Decision Tree Model with train/test, heatmap, and cross-validation code.
* clf.py: XGBoost Decision Tree Model with train/test, heatmap, and cross-validation code.

