Large Language Models Can Plan Your Travels Rigorously with Formal Verification Tools

Large Language Models Can Plan Your Travels Rigorously with Formal Verification Tools

ACL ARR 2024 June Submission3811 Authors

16 Jun 2024 (modified: 18 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In Xie et al. (2024), the authors proposed TravelPlanner, a U.S. domestic travel planning benchmark, and showed that LLMs themselves cannot make travel plans that satisfy user requirements with a best success rate of 0.6\%. The state-of-the-art methods that combine LLMs with external critics, verifiers, and humans can only improve the success rate to 20\% (Kambhampati et al.). In this work, we propose a framework that enables LLMs to formally formulate and solve combinatorial search problems such as the travel planning problem as a satisfiability modulo theory (SMT) problem, and use SMT solvers to automatically and interactively solve them. The SMT solvers guarantee to find a plan when input constraints are satisfiable. When the input constraints cannot be satisfiable, our LLM-based framework can interactively and adaptively offer modification suggestions to users using SMT solvers' capability of identifying the unsatisfiable core. We evaluate our framework with TravelPlanner and achieve a success rate of 97\% for satisfiable queries. We also create a separate dataset that contains international travel benchmarks and show that when initial user queries are unsatisfiable, our interactive planning framework can generate valid plans with an average success rate of 78.6\% for the international travel benchmark and 85.0\% for TravelPlanner according to diverse humans preferences. We show that our framework could achieve zero-shot generalization to unseen constraints in travel planning problems. In addition, we introduce four new combinatorial optimization tasks and show that our framework could generalize well to new domains in a zero-shot manner.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: LLM Planning, LLM Tool-Use, Code Generation and Understanding, Human-AI Interaction

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 3811

Loading