Automatic Identification of Chinese Modality Based on Pre-trained Language Models

ACL ARR 2025 May Submission6106 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recognizing the modality of utterances is crucial to NLP tasks that require a deep understanding of semantics and pragmatics, including semantic inference, dialogue systems, and so on. This paper focuses on the automatic identification of Chinese language modality with different machine learning models, including classic models, pretrained transformers, and Large Language Models (LLMs). We conduct experiments on a Chinese dataset that we annotate with four types of modalities. The results show that the fine-tuned BERT model achieves the best performance, with an F1 score of 0.74, significantly outperforming the LLMs and other models. The study reveals the difficulty of the task, and while LLMs have demonstrated exceptional performance across a wide range of NLP tasks, their ability to handle tasks that heavily rely on semantic and pragmatic understanding remains limited, underscoring the need for more efforts in improving the NLP models, including LLMs, on this task.
Paper Type: Long
Research Area: Semantics: Lexical and Sentence-Level
Research Area Keywords: Chinese modality classification, Bi-LSTM, BERT, LLMs, Machine learning
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: Chinese
Submission Number: 6106
Loading