Abstract: This paper introduces a Large Language Model-informed geometric embedding for retrieving behavioural driving scenarios from unlabelled trajectory data, aimed at improving the search of real driving data for scenario-based testing. A Variational Recurrent Autoencoder with a Hausdorff Distance-based loss generates trajectory embeddings that capture detailed spatial patterns and interactions, offering enhanced interpretability over traditional mean squared error-based models. The embeddings are further organised through unsupervised clustering using HDBSCAN, grouping scenarios by similarities at the scene, infrastructure, behaviour, and interaction levels. Using GPT-4o for describing scenarios, clusters, and inter-cluster relationships, the approach enables targeted scenario retrieval via a Graph Retrieval-Augmented Generation pipeline, enabling a natural language search of unlabelled trajectories. Evaluation demonstrates a retrieval precision of 80.2% for behavioural queries involving
External IDs:dblp:conf/vehits/SohnDBSEAS25
Loading