CRSA: A Chinese Single-Domain Task-Oriented Dialogue Dataset with Contextual Rich Semantic Annotations

Xuefei Wang; Jun Han; Qitong Sun

CRSA: A Chinese Single-Domain Task-Oriented Dialogue Dataset with Contextual Rich Semantic Annotations

Xuefei Wang, Jun Han, Qitong Sun

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Task-Oriented Dialogue, Multi-turn Dialogue Dataset, Benchmark, Semantic Annotation, LLM Training Resource, Dialogue System Training, TOD Subtask Evaluation

Abstract: Task-oriented dialogue (TOD) systems support users in achieving domain-specific goals via natural language interactions and critically depend on high-quality datasets. However, existing datasets often lack authenticity, fine-grained semantic annotations, and explicit process control, limiting effectiveness in complex business scenarios. To address these, we introduce CRSA, a Chinese TOD dataset that integrates diverse sources to construct semantically rich, structurally realistic dialogues, and adopts a multi-level annotation framework to model dialogue acts, user intents, and task flows more effectively. To evaluate the quality and application potential of CRSA, we conduct three sets of experiments spanning data quality, system training effectiveness, and task adaptability. Results demonstrate that CRSA provides strong support for process modeling, strategy learning, and response generation, establishing it as a robust and versatile resource for TOD research. The dataset is publicly available at https://anonymous.4open.science/r/CRSA-CBBB.

Primary Area: datasets and benchmarks

Submission Number: 10212

Loading