Factorio Learning Environment

Jack Hopkins; Mart Bakler; Akbir Khan

Factorio Learning Environment

Jack Hopkins, Mart Bakler, Akbir Khan

Published: 18 Sept 2025, Last Modified: 20 Jan 2026NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: open-endedness, evaluation, benchmark, long-term planning, automation, sandbox, factorio, LLM, agent

TL;DR: Factorio Learning Environment is an evaluation for frontier models that offers exponentially scaling challenges.

Abstract: Large Language Models (LLMs) are rapidly saturating existing benchmarks, necessitating new open-ended evaluations. We introduce the Factorio Learning Environment (FLE), based on the game of Factorio, that tests agents in long-term planning, spatial reasoning, program synthesis, and resource optimization. FLE provides exponentially scaling challenges -- from basic automation to complex factories processing millions of resource units per second. We provide two settings: (1) open-play with the open-ended task of building the largest factory on an procedurally generated map and (2) lab-play consisting of 33 bounded tasks accross three settings with fixed resources. We demonstrate across both settings that models still lack strong spatial reasoning. In lab-play, we find that LLMs exhibit promising short-horizon skills, yet are unable to operate effectively in constrained environments, reflecting limitations in error analysis. In open-play, while LLMs discover automation strategies that improve growth (e.g electric-powered drilling), they fail to achieve complex automation (e.g electronic-circuit manufacturing)

Code URL: https://github.com/Anon28352/factorio-learning-environment

Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling

Submission Number: 1814

Loading