PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks

PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks

ACL ARR 2025 May Submission6285 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We present PricingLogic, the first benchmark that probes whether Large Language Models (LLMs) can reliably automate tourism-booking prices when multiple, overlapping fare rules apply. Travel agencies are eager to offload this error-prone task to AI systems; however, deploying LLMs without verified reliability could result in significant financial losses and erode customer trust. PricingLogic comprises 300 natural-language booking requests derived from 42 real-world pricing policies, spanning two levels of difficulty: (i) basic customer-type pricing and (ii) bundled-tour calculations involving interacting discounts. Evaluations of a line of LLMs reveal a steep performance drop on the harder tier, exposing systematic failures in rule interpretation and arithmetic reasoning. These results highlight that, despite their general capabilities, today’s LLMs remain unreliable for revenue-critical applications without further safeguards or domain adaptation.

Paper Type: Short

Research Area: NLP Applications

Research Area Keywords: financial/business NLP

Contribution Types: Data resources, Data analysis

Languages Studied: English, Chinese

Submission Number: 6285

Loading