TourismNLG: A Multi-lingual Generative Benchmark for the Tourism Domain

Published: 01 Jan 2023, Last Modified: 20 May 2025ECIR (1) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The tourism industry is important for the benefits it brings and due to its role as a commercial activity that creates demand and growth for many more industries. Yet there is not much work on data science problems in tourism. Unfortunately, there is not even a standard benchmark for evaluation of tourism-specific data science tasks and models. In this paper, we propose a benchmark, TourismNLG, of five natural language generation (NLG) tasks for the tourism domain and release corresponding datasets with standard train, validation and test splits. Further, previously proposed data science solutions for tourism problems do not leverage the recent benefits of transfer learning. Hence, we also contribute the first rigorously pretrained mT5 and mBART model checkpoints for the tourism domain. The models have been pretrained on four tourism-specific datasets covering different aspects of tourism. Using these models, we present initial baseline results on the benchmark tasks. We hope that the dataset will promote active research for natural language generation for travel and tourism. (https://drive.google.com/file/d/1tux19cLoXc1gz9Jwj9VebXmoRvF9MF6B/.)
Loading