Keywords: Benchmark, Language Agent, Large Language Model, Travel Planning, Neuro-Symbolic Learning
TL;DR: We present ChinaTravel, a travel planning benchmark incorporating a scalable evaluation framework and real human requirements, demonstrating the capability of neuro-symbolic methods in satisfying complex constraints.
Abstract: Recent advances in LLMs have spurred the development of \emph{Language Agents} for real-world applications such as travel planning, which involves complex multi-constraint challenges. Existing benchmarks, however, often oversimplify reality with synthetic queries and limited constraints. To bridge this gap, we introduce \emph{ChinaTravel}, the first open-ended benchmark based on authentic travel needs. We develop a domain-specific language (DSL) for compositional evaluation covering feasibility, constraints, and preferences. Experiments show neuro-symbolic agents achieve a 37.0\% constraint satisfaction rate on human queries, a 10× improvement over neural models, demonstrating their potential in complex planning scenarios.
Submission Number: 229
Loading