ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning

ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning

ACL ARR 2025 February Submission4802 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advances in LLMs, particularly in language reasoning and tool integration, have rapidly sparked the real-world development of \emph{Language Agents}. Among these, travel planning represents a prominent domain, combing complex multi-objective planning challenges with practical deployment demands. Existing benchmarks, however, often oversimplify real-world requirements by focusing on synthetic queries and limited constraints. To address this gap, we introduce \emph{ChinaTravel}, the first benchmark designed for authentic Chinese travel planning scenarios. We collect the travel requirements from questionnaires and propose a compositionally generalizable domain-specific language that enables a scalable evaluation process, covering feasibility, constraint satisfaction, and preference comparison. Empirical studies reveal the potential of neuro-symbolic agents in travel planning, achieving 27.9\% constraint satisfaction rate on human queries, a 10.7× improvement over purely-neural models (2.6\%). Moreover, we identify key challenges in real-world deployments, including open language reasoning and unseen concept composition. These findings highlight the significance of ChinaTravel as a pivotal milestone for advancing language agents in complex, real-world planning scenarios.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking; evaluation; applications

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: Chinese

Submission Number: 4802

Loading