Keywords: Human-AI collaboration, AI Companion, Game AI system
TL;DR: This paper introduces the first-ever open-interaction and real-time voice-operated AI companion system for commercial 3D FPS games.
Abstract: Traditionally, players in first-person shooter (FPS) games have been limited to communicating with AI companions using simple commands like “attack,” “defend,” or “retreat” due to the constraints of existing input methods such as hotkeys and command wheels. One major limitation of these simple commands is the lack of target specificity, as the numerous targets in a 3D virtual environment are difficult to specify using existing input methods. This limitation hinders players’ ability to issue complex tactical instructions such as “clear the second floor,” “take cover behind that tree,” or “retreat to the river.” To overcome this limitation, this paper introduces the $\textbf{A}$I $\textbf{C}$ompanion with $\textbf{V}$oice $\textbf{I}$nteraction $(\textbf{ACVI})$, the first-ever AI system that allows players to interact with FPS AI companions through natural language. Deployed in the popular FPS game $\textit{Arena Breakout: Infinite}$, this revolutionary feature creates the most immersive experience for players, enabling them to work with human-like AI. ACVI is not confined to executing limited commands through simple rule-based systems. Instead, it allows players to engage in real-time voice interactions with AI teammates. By integrating various natural language processing techniques within a confidence-based selection framework, it achieves rapid and accurate decomposition of complex commands and intent reasoning. Moreover, ACVI employs a multi-modal dynamic entity retrieval method for environmental perception, aligning human intentions with decision-making elements. It can accurately comprehend complex voice commands and delivers real-time behavioral responses and vocal feedback to provide close tactical collaboration to players. Additionally, it can identify more than 17,000 objects in the game, including buildings, vehicles, grasslands, and collectible items, and has the ability to accurately distinguish different colors and materials.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13814
Loading