Large Language Model-Informed Geometric Trajectory Embedding for Driving Scenario Retrieval

Published: 2025, Last Modified: 28 Jan 2026VEHITS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper introduces a Large Language Model-informed geometric embedding for retrieving behavioural driving scenarios from unlabelled trajectory data, aimed at improving the search of real driving data for scenario-based testing. A Variational Recurrent Autoencoder with a Hausdorff Distance-based loss generates trajectory embeddings that capture detailed spatial patterns and interactions, offering enhanced interpretability over traditional mean squared error-based models. The embeddings are further organised through unsupervised clustering using HDBSCAN, grouping scenarios by similarities at the scene, infrastructure, behaviour, and interaction levels. Using GPT-4o for describing scenarios, clusters, and inter-cluster relationships, the approach enables targeted scenario retrieval via a Graph Retrieval-Augmented Generation pipeline, enabling a natural language search of unlabelled trajectories. Evaluation demonstrates a retrieval precision of 80.2% for behavioural queries involving
Loading