# Data Pre-processing

Run `getdata.sh` under this directory to download `SIGHAN2005` data, and process the raw data in `SIGHAN2005` and `CTB6`.

You need to obtain the official `CTB6` data by yourself and put the folder `LDC07T36` under this directory.

After you run `getdata.sh`, you will see the processed data grouped by dataset in the `../data` directory.
