Multimodal Design for Interactive Collaborative Problem-Solving Support

Hannah VanderHoeven, Mariah Bradford, Changsoo Jung, Ibrahim Khebour, Kenneth Lai, James Pustejovsky, Nikhil Krishnaswamy, Nathaniel Blanchard

Published: 01 Jan 2024, Last Modified: 04 Nov 2025HCI (6) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: When analyzing interactions during collaborative problem solving (CPS) tasks, many different communication modalities are likely to be present and interpretable. These modalities may include speech, gesture, action, affect, pose and object position in physical space, amongst others. As AI becomes more prominent in day-to-day use and various learning environments, such as classrooms, there is potential for it to support additional understanding into how small groups work together to complete CPS tasks. Designing interactive AI to support CPS requires creating a system that supports multiple different modalities. In this paper we discuss the importance of multimodal features to modeling CPS, how different modal channels must interact in a multimodal AI agent that supports a wide range of tasks, and design considerations that require forethought when building such a system that most effectively interacts with and aids small groups in successfully completing CPS tasks. We also outline various tool sets that can be leveraged to support each of the individual features and their integration, and various applications for such a system.