ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning

Jie-Jing Shao; Bo-Wen Zhang; Xiao-Wen Yang; Baizhi Chen; Siyu Han; Wen-Da Wei; Guohao Cai; Zhenhua Dong; Lan-Zhe Guo; Yu-Feng Li

ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning

Jie-Jing Shao, Bo-Wen Zhang, Xiao-Wen Yang, Baizhi Chen, Siyu Han, Wen-Da Wei, Guohao Cai, Zhenhua Dong, Lan-Zhe Guo, Yu-Feng Li

Published: 24 Sept 2025, Last Modified: 09 Oct 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Benchmark, Language Agent, Large Language Model, Travel Planning, Neuro-Symbolic Learning

TL;DR: We present ChinaTravel, a travel planning benchmark incorporating a scalable evaluation framework and real human requirements, demonstrating the capability of neuro-symbolic methods in satisfying complex constraints.

Abstract: Recent advances in LLMs have spurred the development of \emph{Language Agents} for real-world applications such as travel planning, which involves complex multi-constraint challenges. Existing benchmarks, however, often oversimplify reality with synthetic queries and limited constraints. To bridge this gap, we introduce \emph{ChinaTravel}, the first open-ended benchmark based on authentic travel needs. We develop a domain-specific language (DSL) for compositional evaluation covering feasibility, constraints, and preferences. Experiments show neuro-symbolic agents achieve a 37.0\% constraint satisfaction rate on human queries, a 10× improvement over neural models, demonstrating their potential in complex planning scenarios.

Submission Number: 229

Loading