WYWEB: A Classical Chinese NLP Evaluation BenchmarkDownload PDF


16 Oct 2022 (modified: 05 May 2023)ACL ARR 2022 October Blind SubmissionReaders: Everyone
Abstract: For natural language processing (NLP), evaluation benchmarks, such as GLUE, and SuperGLUE, allow researchers to evaluate new models on a set of tasks. For Chinese NLU, the CLUE benchmark brings together more than 10 tasks, benefiting Chinese language researchers. However, CLUE does not apply to Classical Chinese, also known as “wen yan wen”( 文言文 ), which has thousands of years of inheritance attracting researchers from all over the world. For the prosperity of the community, in this paper, we introduce WYWEB evaluation benchmark, which contains eight tasks, implementing sentence classification, sequence labeling, reading comprehension, and machine translation. All of the tasks are designed according to actual requirements of domain researchers and students. The github repository and leaderboard of WYWEB will be released when accepted.
Paper Type: long
Research Area: Resources and Evaluation
0 Replies