Abstract: This paper introduces MineLlama, a lightweight framework that utilizes a localized large language model, Llama, to enhance decision-making in the sandbox game Minecraft, without relying on external APIs. MineLlama operates a two-layer framework, consisting of planning and executing modules. In the planning module, MineLlama employs a retrieval-augmented generation (RAG) paired with a query engine that utilizes recipe information from Minecraft to decompose a final goal into a series of interdependent subgoals. In the executing module, MineLlama is also constructed using a RAG, with a query engine informed by general knowledge of Minecraft, to guide the agent in choosing the appropriate actions. The framework’s efficiency is demonstrated through evaluations on 7 diverse Minecraft tasks, showcasing its ability to guide agents in achieving specific goals. We upload all the contents including code and videos to: https://minecraftagents.github.io/MineLlama_hp/.
Loading