Keywords: Text-to-TrajVis; Text-to-TrajVis Benchmark; Natural Language Processing; Trajectory Visualization Language; Large Language Models
Abstract: This paper introduces the Text-to-TrajVis task, which aims to transform natural language questions into trajectory data visualizations, facilitating the development of natural language interfaces for trajectory visualization systems. As this is a novel task, there is currently no relevant dataset available in the community. To address this gap, we first devised a new visualization language called Trajectory Visualization Language (TVL) to facilitate querying trajectory data and generating visualizations. Building on this foundation, we further proposed a dataset construction method that integrates Large Language Models (LLMs) with human efforts to create high-quality data. Specifically, we devised a four-stage pipeline that begins with candidate extraction, proceeds through seed TVL generation and tree-based expansion, and concludes with LLM-driven question creation followed by human validation. This process results in the creation of the first large-scale Text-to-TrajVis dataset, named TrajVL, which contains 9,608 (question, TVL) pairs. We propose a framework called TRCAT for progressively converting natural language questions into TVLs. The framework incorporates TVL-RAG Chain Module and Area-Time Standardization Module, significantly enhancing the accuracy of LLMs in TVL generation. Based on the TrajVL dataset, we conduct a comprehensive evaluation of TRCAT's performance across several mainstream LLMs (e.g., GPT, Qwen, LLaMA, and Gemma). Furthermore, we established a benchmarking system for this task, providing a foundation for future research in structured trajectory language generation.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: NLP datasets; benchmarking; automatic creation and evaluation of language resources; evaluation
Contribution Types: Data resources
Languages Studied: English
Submission Number: 3478
Loading