Bootstrapping Assessments for Team Simulations: Transfer Learning Between First-Person-Shooter Game Maps

Benjamin D. Nye, Mark G. Core, Sai V. R. Chereddy, Vivian Young, Daniel Auerbach

Published: 2024, Last Modified: 13 Nov 2024HCI (44) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Assessing teams and providing feedback on scenario-based training typically requires human observers or scenario-specific metrics crafted by experts, due to the complexity of general-purpose automated tools to assess team performance. Machine learning can help infer team performance patterns, but labeled data for a specific training scenario is often sparse. To address this issue, the Semi-Supervised Learning for Assessing Team Simulations (SLATS) project investigated the feasibility of semi-supervised learning and transfer learning which leverages training data from related scenarios to classify performance on a target scenario with the same metrics but a different terrain context. To this approach, we analyzed performance of teams in the first-person shooter Team Fortress 2 (TF2). TF2 teams for the “Capture Point” mode were classified into archetypes based on the performance of the team and the performance of individual members of the team across the corpus: novice, weak link, team of experts, and expert team. To investigate the feasibility of transfer learning, we isolated matches from two of the most frequent maps/terrains. Results found that leveraging data from the source map always improved classification F1-scores compared to relying solely upon target (test) map training data. The greatest benefits were observed when target data was limited (0 to 42 target examples). While further research is required to explore the effectiveness of transfer learning across training scenarios that are more dissimilar (e.g., different simulations, rather than just different maps), these results offer a promising direction to help bootstrap team assessments on new training scenarios by leveraging data from earlier, comparable scenarios. However, efficiently calculating reusable metrics for model features based on low-level scenario events and logs remains a challenge that requires further research.