
Project Name: SkillBERT

Step 1 - Package installation
pip3.6 intall -r requiremets.txt

Step 2 - Running the code
Download all the data and other files from location https://www.dropbox.com/s/wcg8kbq5btl4gm0/code_data_pickle_files.zip?dl=0
An overview of all the folders present is given below:
2.1  training_codes:This folder contains the main python files used for running the experiments mentioned in the paper. Inside the main() method there are functions for data preparation,
 training, and testing. We have provided comments in each section for a better understating of the modules. The code present in the file "skillbert_spectral_clustering.py" is used to
 train the Bi-LSTM model on SkillBERT and spectral clustering related features which gave us the best performance. You can directly jump to this code if you don’t want to run other
 intermediary experiments. The experiment for classifying a skill into core and fringe can be run using 3_class_classifier.py. 
 Apart from these if you want to run other experiments mentioned in the paper, you can do so by458running "word2vec_only.py" for classifying skills using only Word2vec model,
 "skillbert.py" for459classifying skills using only SkillBERT model, "bert_pretrain_only.py" for classifying skills using only pre-trained BERT model and "skillbert_and_kmeans.py" 
for classifying skills using SkillBERT461and k-means on SkillBERT embedding

2.2 feature_creation:This folder contains the code for creating features used for training the models. If you don’t want to go through each code, features created 
    using these code files are already available464in the feature_data folder. Codes present in the training_code also uses these CSV files directly for the model training
2.3 feature_data: As mentioned before, this folder contains CSV files of features generated using467codes present in feature_creation folder
2.4 model:This folder contains the final model trained using all the experiments mentioned in the469paper. Folder "skill_bert_spectral_clustering" contains the Bi-LSTM model which 
    has been used as470the final model
2.5 dataset:This folder contains the final training and testing data used for each experiment. You can use these files to directly test the corresponding model.
