CraftText Benchmark: Advancing Language Grounding in Complex Multimodal Open-Ended World

CraftText Benchmark: Advancing Language Grounding in Complex Multimodal Open-Ended World

16 Aug 2024 (modified: 27 Aug 2024)ACL ARR 2024 August SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Grounding language models in multimodal environments is a pivotal challenge in AI, enabling agents to link linguistic inputs with sensory data, such as visual information. Existing environments, however, often limit the complexity of agent behavior due to restricted dynamics or vocabulary. To address these limitations, we propose a new benchmark named CrafText based on the Craftax environment—a dynamic, stochastic setting with extensive game mechanics and a rich vocabulary. This benchmark is designed to evaluate agents on complex tasks involving spatial reasoning, logic, and context, offering a rigorous platform for advancing multimodal AI research.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: cross-modal application; multimodality;embodied agents;

Contribution Types: Publicly available software and/or pre-trained models, Data resources, Surveys

Languages Studied: English

Submission Number: 474

Loading