# HIR-16K Dataset

Due to the full dataset is larger than 50MB, only several samples is provided for reference in `HIR-16K.jsonl`. We will make publicly available if accepted. The distribution of the whole training dataset is shown in the follwoing.

![image](../images/dataset.png)