Quantifying Generalisation in Imitation Learning

Nathan Gavenski; Odinaldo Rodrigues

Quantifying Generalisation in Imitation Learning

Nathan Gavenski, Odinaldo Rodrigues

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Imitation Learning, Generalisation, Environment, Benchmark

Abstract: Imitation learning benchmarks often lack sufficient variation between training and evaluation, limiting meaningful generalisation assessment. We introduce Labyrinth, a benchmarking environment designed to test generalisation with precise control over structure, start and goal positions, and task complexity. It enables verifiably distinct training, evaluation, and test settings. Labyrinth provides a discrete, fully observable state space and known optimal actions, supporting interpretability and fine-grained evaluation. Its flexible setup allows targeted testing of generalisation factors and includes variants like partial observability, key-and-door tasks, and ice-floor hazards. By enabling controlled, reproducible experiments, Labyrinth advances the evaluation of generalisation in imitation learning and provides a valuable tool for developing more robust agents.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/NathanGavenski/Labyrinth-v0_5x5

Code URL: https://github.com/NathanGavenski/Labyrinth

Supplementary Material: pdf

Primary Area: Data for Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 1833

Loading