Effective and Efficient Word-type aware Chinese Lexical SimplificationDownload PDF


16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: In this paper, we address the task of Chinese lexical simplification (CLS), which aims to replace complex words in a given sentence with simpler alternatives of equivalent meaning. We propose an effective and efficient CLS system that combines small and large models in a complementary way based on the type of complex words. Specifically, we analyze the strengths and weaknesses of small models and ChatGPT. We find that ChatGPT performs well in simplifying in-dictionary common words and Chinese idioms, while small models struggle with them. Therefore, we propose an automatic knowledge distillation approach to fine-tune small models with in-dictionary words-oriented training data generated by ChatGPT. On the other hand, we find that both small models and ChatGPT have difficulties with out-of-dictionary (OOD) words. To address this issue, we use a retrieval-based interpretation augmentation strategy to enrich the input with relevant information from external sources. With this strategy, both small models and ChatGPT can significantly improve their performance in simplifying OOD words. Finally, we introduce a simple controller that selects the best model or tool for each complex word according to its type. This hybrid approach can balance performance and cost and achieve better results than any single model.
Paper Type: long
Research Area: Semantics: Lexical
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data analysis
Languages Studied: Chinese
0 Replies
