Dungeons and Data: A Large-Scale NetHack Dataset

Eric Hambro; Roberta Raileanu; Danielle Rothermel; Vegard Mella; Tim Rocktäschel; Heinrich Kuttler; Naila Murray

Dungeons and Data: A Large-Scale NetHack Dataset

Eric Hambro, Roberta Raileanu, Danielle Rothermel, Vegard Mella, Tim Rocktäschel, Heinrich Kuttler, Naila Murray

Published: 17 Sept 2022, Last Modified: 04 Aug 2025NeurIPS 2022 Datasets and Benchmarks Readers: Everyone

Keywords: reinforcement learning, offline RL, RL dataset, procedural generation, human demonstrations

TL;DR: We introduce and evaluate a new large-scale dataset for the game of NetHack, including 10 billion transitions from humans, 3 billion from a symbolic bot, and code for researchers to record and load their own trajectories.

Abstract: Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dataset (NLD), a large and highly-scalable dataset of trajectories from the popular game of NetHack, which is both extremely challenging for current methods and very fast to run. NLD consists of three parts: 10 billion state transitions from 1.5 million human trajectories collected on the NAO public NetHack server from 2009 to 2020; 3 billion state-action-score transitions from 100,000 trajectories collected from the symbolic bot winner of the NetHack Challenge 2021; and, accompanying code for users to record, load and stream any collection of such trajectories in a highly compressed form. We evaluate a wide range of existing algorithms for learning from demonstrations, showing that significant research advances are needed to fully leverage large-scale datasets for challenging sequential decision making tasks.

Author Statement: Yes

URL: https://github.com/dungeonsdatasubmission/dungeonsdata-neurips2022

Dataset Url: Submission: https://github.com/dungeonsdatasubmission/dungeonsdata-neurips2022 Repo: https://github.com/facebookresearch/nle Dataset Document: https://github.com/facebookresearch/nle/blob/main/DATASET.md

License: The dataset and code are submitted under the NetHack General License which can be found here: https://github.com/facebookresearch/nle/blob/main/LICENSE

Supplementary Material: pdf

Contribution Process Agreement: Yes

In Person Attendance: Yes

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 6 code implementations](https://www.catalyzex.com/paper/dungeons-and-data-a-large-scale-nethack/code)

19 Replies

Loading