1. Run create_raw.py to download original corpus and simply process it for following process.
2. Use JavaExtractor to extract AST.
3. Run process.py to process the dataset, dataset will be stored in "java".

The JavaExtractor from https://github.com/tech-srl/code2seq/tree/master/JavaExtractor.