Keywords: Reinforcement Learning, Model-based Reinforcement Learning, Shielding, Alignment, Non-Player Characters, Machine Learning, Reward Shaping, Robotics
TL;DR: Shields are applied in robotics to teach a task-oriented policy to a robot to follow some predefined safety requirements. We explore how shielding can be used for designing stylistic behaviours of Non-Player Characters in video games.
Abstract: Reinforcement Learning can create agents that are able to play games at a human, or even super-human, level. Shielding in Reinforcement Learning is a technique used in robotics to enforce safe decision-making during both learning and execution, and allows robots to perform tasks safely. We explored how shields can be repurposed to align an agent to follow a designed style specification for more human-like and believable Non-Player Character behaviour in video games. Shielding can alleviate the need for extensive reward shaping when designing qualitative behaviours. However, classical shielding is often too restrictive in its assumptions to be easily applied to complex 3D environments with large continuous state spaces, such as video games. We proposed the use and repurposing of Approximate Model-based Shielding (AMBS) for video game purposes. We explored how AMBS can be used to introduce a style aspect to a task policy.
Submission Number: 18
Loading