Keywords: Imitation Learning, Generalisation, Environment, Benchmark
Abstract: Imitation learning benchmarks often lack sufficient variation between training and evaluation, limiting meaningful generalisation assessment.
We introduce Labyrinth, a benchmarking environment designed to test generalisation with precise control over structure, start and goal positions, and task complexity.
It enables verifiably distinct training, evaluation, and test settings.
Labyrinth provides a discrete, fully observable state space and known optimal actions, supporting interpretability and fine-grained evaluation.
Its flexible setup allows targeted testing of generalisation factors and includes variants like partial observability, key-and-door tasks, and ice-floor hazards.
By enabling controlled, reproducible experiments, Labyrinth advances the evaluation of generalisation in imitation learning and provides a valuable tool for developing more robust agents.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/NathanGavenski/Labyrinth-v0_5x5
Code URL: https://github.com/NathanGavenski/Labyrinth
Supplementary Material: pdf
Primary Area: Data for Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)
Submission Number: 1833
Loading