Ancient Chinese Machine Reading Comprehension Exception Question Dataset with a Non-trivial Model

Published: 01 Jan 2023, Last Modified: 18 Apr 2025PRICAI (2) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Ancient Chinese Reading Comprehension (ACRC) is challenging for the absence of datasets and the difficulty of understanding ancient languages. Further, among ACRC, entire-span-regarded (Entire spaN regarDed, END) questions are especially exhausting because of the input-length limitation of seminal BERTs, which solve modern-language reading comprehension expeditiously. To alleviate the datasets absence issue, this paper builds a new dataset ACRE (Ancient Chinese Reading-comprehension End-question). To tackle long inputs, this paper proposes a non-trivial model which is based on the convolution of multiple encoders that are BERT decedents, named EVERGREEN (EVidence-first bERt encodinG with entiRE-tExt coNvolution). Besides proving the effectiveness of encoding compressing via convolution, our experimental results also show that, for ACRC, first, neither pre-trained AC language models nor long-text-oriented transformers realize its value; second, the top evidence sentence along with distributed sentences are better than top-n evidence sentences as inputs of EVERGREEN; third, comparing with its variants, including dynamic convolution and multi-scale convolution, classical convolution is the best.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview