Data Files:
    data/clustering_results.txt
        - results file after applying Agglomerative Clustering with K=1000 and linkage='ward'
        - tab separated values with format:    "Word"	"Occurrence Counter"	"Sentence ID"	"Token ID"	"Cluster ID"
        
    data/test_results.txt    
        - results file after applying Logistic Regression Classifier trained on labels from clustering_results.txt file
        - tab separated values with format:    "Word"	"Occurrence Counter"	"Sentence ID"	"Token ID"	"Cluster ID"
        
    data/human_annotation.txt
        - human annotated concepts on clustering_results.txt file
        - tab separated values with format: "Cluster ID"	"Concept Tag"
        
    data/train_sentences.txt
        - json file containing sentences used to extract tokens in clustering_results.txt file
        
    data/test_sentences.json
        - json file containing sentences used to extract tokens in test_results.txt file
        
Code Files:
    code/clustering.py
        - This code performs Agglomerative clustering and outputs each word with the cluster id
        -Sample command
            python code/clustering.py --vocab-file train-vocab.npy --point-file train-points.npy --output-path save_dir/ --cluster 1000 
        
    code/classification.py
        - This code training a Logistic Regression classifier on the clustering output and predicts the clusters for test vocabulary and representations if the prediction probability is greater than the threshold
        - Sample command
            python code/classification.py --train-vocab-file train-vocab.npy --train-point-file train-points.npy --test-vocab-file test-vocab.npy  --test-point-file test-points.npy --clustering-file clusters-1000.txt --output-path save_dir/ --threshold 0.9 
                
    code/evaluate_clustering.py
        - This code evaluates the elbow score for the clustering
        -  Sample command
            python code/evaluate_clustering.py --vocab-file train-vocab.npy --point-file train-points.npy --clustering-file clusters-1000.txt