MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft

Haowei Lin; Zihao Wang; Jianzhu Ma; Yitao Liang

MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft

Haowei Lin, Zihao Wang, Jianzhu Ma, Yitao Liang

Published: 28 Oct 2023, Last Modified: 07 Dec 2023ALOE 2023 PosterEveryoneRevisionsBibTeX

Keywords: Open-ended Agent, Benchmark, Evaluation, Minecraft

Abstract: To pursue the goal of creating an open-ended agent in Minecraft, an open-ended game environment with unlimited possibilities, this paper introduces a novel task-centric framework named MCU for Minecraft agent evaluation. The MCU framework leverages the concept of atom tasks as fundamental building blocks, enabling the generation of diverse or evan arbitrary tasks. Within the MCU framework, each task is measured with 6 distinct difficulty scores (time consumption, operational effort, planning complexity, intricacy, creativity, novelty). These scores offer a multi-dimensional assessment of a task from different angles, and thus can reveal an agent's capability on specific facets. The difficulty scores also serve as the feature of each task, which creates a meaningful task space and unveils the relationship between tasks. For practical evaluation of Minecraft agents employing the MCU framework, we maintain two custom benchmarks, comprising tasks meticulously designed to evaluate the agents' proficiency in high-level planning and low-level control, respectively. We show that MCU has the high expressivity to cover all tasks used in recent literature on Minecraft agent, and underscores the need for advancements in areas such as creativity, precise control, and out-of-distribution generalization under the goal of open-ended Minecraft agent development.

Submission Number: 50

Loading