We provide the samples used for human evaluation of each method.
Due to the upload size limit for submission, we only include 80 samples out of 100 samples evaluated.
