## 2. Multi-Agent Human-Robot Interaction

In order to formally define multi-agent Human-Robot Interaction systems, we first need to define 'interaction'. The American Psychological Association (APA, Dictionary of Psychology) defines a social interaction as a process that involves reciprocal stimulation or response between two or more individuals. In the context of multi-agent HRI, however, reciprocity is not a necessary requirement and an interaction can be defined as: information flow between two or more agents occurring as a result of communication, action or mere presence of any of those agents . This information flow results in change in actions, behavior or mental state of the agents on the receiving end. Using this definition, we can express a multi-agent HRI system using an interaction graph (Fig. 1). The figure shows a multi-agent HRI system and the corresponding interaction graph. Each agent is depicted as a node in a directed graph in which each edge represents interaction between two agents and points towards the direction of information flow. We use this interaction graph structure to define multi-agent HRI systems:

Multi-agent Human-Robot Interaction systems are the ones for which the interaction graph contains three or more nodes (agents), with at least three nodes connected via interaction edges, including at least one interaction edge between a human node and a robot node.

The information exchange can happen by means of verbal/non-verbal communication or through an interaction interface. The agents may be homogeneous in their roles and identities, or they can be heterogeneous and contribute differently in the system. Moreover, these agents can be located in separate environments interacting over virtual channels or they can share same physical space. In this survey, we consider the literature on multi-agent HRI systems and try to characterize them based on different properties.

Interactions in multi-agent human-robot systems do bear similarities with interactions in human-only groups. As we discuss in later sections, several studies in the literature try to apply understanding of human-only groups to multi-agent human-robot systems (e.g., by using concepts of human group interactions from social sciences). When compared to robot-only groups, interactions in humanrobot systems come with a higher uncertainty of outcome, and studies emphasize making human response to an interaction more predictable (e.g. by increasing human's trust in robots or by making robots' intentions clearer). In this paper, we also discuss how research from human-only and robot-only groups can help multi-agent HRI, and open new avenues for future research.

Human-Robot Interactions among multiple agents, when compared to dyadic interactions, introduce a number of changes and complexities in the system. With multiple agents present, multi-agent interactions and indirect

<!-- image -->

interactions can emerge in the system. It also becomes possible to have humans interacting with other humans and robots with other robots. These Human-Human Interactions (HHI) and Robot-Robot Interactions (RRI) then ultimately affect (or are affected by) interactions between humans and robots. The presence of multiple agents also require advanced communication modalities and interfaces to enable effective interaction among all agents. As the number of agents grow in the system, modelling their behavior and controlling their actions becomes increasingly challenging. We look at each of these aspects of multiagent HRI systems separately.

We propose that multi-agent HRI systems can be characterized and compared to one another based on three core aspects: 1) Team structure, 2) Interaction style and 3) Computational characteristics. These aspects represent the way a system is set up, and the way interactions take place and methods by which agents are controlled in that system.

Team Structure: The first aspect relates to agents that constitute the human-robot system. This includes parameters such as the number and types of agents (both humans and robots) present in the system, and homogeneity among different agents. Homogeneity is a measure of similarities in roles, capabilities, embodiment and the authorities granted to each of the agents (deciding who can initiate an interaction, give commands, request assistance etc.).

Interaction Style: The second aspect of multi-agent HRI systems describes how the agents interact with each other. This includes factors such as modality of communication happening among the agents and interaction models being implemented. This tells us the way in which interactions take place in the system (through a screen or speech etc.) and the agents involved (between two or among multiple agents).

Computational Characteristics: The third aspect looks into the software part of HRI systems and describes how the behavior of different agents is controlled or influenced. This includes robot task planning algorithms, model-based/model-free controllers and other ways of deciding robots' actions in the system, and the ways in which they are used to influence human behavior.

These three aspects provide us a way to establish distinctions and comparisons among multi-agent HRI systems. For instance, consider a system where a human operator is supervising a team of remote mobile robots (e.g., (Swamy et al., 2020; Rosenfeld et al., 2017)). We can characterize this system as one with a single human interacting with multiple robots (team size), and one where all robots are similar to one another (homogeneity). The human might be able to give high-level commands to the robot group or control them individually (interaction model), using a screen-based interface (communication modality). The robot control can be made adaptive to actions of the operator or designed to optimize some utility function (computational characteristics).

Under these core aspects, we can characterize HRI systems based on five different attributes as shown in Fig. 2. It should be noted that these core aspects are not independent of each other and a system attribute under one aspect can influence other system attributes. This interaction is discussed in Section 6. Also note that these core aspects are not meant to be exhaustive or the only way of distinguishing multi-agent HRI systems. The aspects are chosen such that they allow for a taxonomy that is both broad enough to be applicable to multi-agent HRI systems from different areas of research, and detailed enough to

meaningfully characterize and compare those systems. In the following sections, we consider the above three aspects and the associated attributes in more detail, and present the categorizations that arise under these attributes. Under each category, we include examples from recent literature to understand its application, and analyze strengths and limitations of different types of systems.

## 3. Team Structure

The most perceptible feature of a multi-agent HRI system is the size and composition of the human-robot team. Depending on the application, a human-robot system can take advantage of more than one human and/or more than one robot in the team. The task at hand may also require to utilize a team of agents with different capabilities and roles. Both number and type of robots in a group can have significant effects on human's perceptions and emotions towards the robots (Fraune et al., 2015). Since humans and robots usually have different ways of acting in a collaborative setting and interacting with their partners, the team structure decides various other aspects of the system including the way in which different agents interact, how they can share information and how their actions are planned (Fincannon et al., 2013, 2011). Based on team structure, multi-agent systems can differ from dyadic systems in two (quite apparent) ways:

1. There can be several possibilities of having different numbers of agents (both humans and robots) in the team,
2. There exists a notion of homogeneity/heterogeneity among the agents.

Therefore, we discuss in the following two factors under team structure: Team size, and Team composition.

## 3.1. Team Size

Team size in an HRI system refers to the number of humans and robots present in the system. Including singlehuman - single-robot systems, the HRI systems can be grouped into the following categories based on their team size:

## 3.1.1. Single-human - single-robot

These are the conventional systems containing a single human interacting with a single robot and are the most common type of systems studied in HRI. This article, however, focuses mainly on systems with a higher number of agents and we refer the readers to other articles available in the literature for a review of single-human single-robot systems, and HRI in general (Bartneck et al., 2020; Breazeal et al., 2016; Goodrich and Schultz, 2008; Bauer et al., 2008).

## 3.1.2. Single-human - multi-robot

These systems comprise teams with a single human interacting or collaborating with multiple robots through the task. Such systems have found their utility in applications where the task is primarily executed by a number of (semi)autonomous robots requiring intermittent interventions/assistance from a human operator or supervisor, either in event of a fault (Swamy et al., 2020; Wang et al., 2014) or to further increase performance of the multi-robot team in areas like large-scale assembly (Sellner et al., 2006) and search-and-rescue (Khasawneh et al., 2019; Wang and Lewis, 2008). These systems are also employed in HumanSwarm Interaction (Kolling et al., 2015) where multiple robots coordinate among themselves while receiving inputs from the human teammate. Many studies have also been conducted to increase our understanding of such systems, by providing measures for predicting the team's performance (Lewis, 2013; Sycara and Lewis, 2012; Zheng et al., 2011; Crandall et al., 2003).

A different structure of single-human - multi-robot systems is one where the human user is not in the role of operator or supervisor. For instance, a team of robots can be used to provide navigation instructions to the human user (Yedidsion et al., 2019), cooperatively navigate with them in potentially dangerous situations (Penders et al., 2011; Saez-Pons et al., 2011), or efficiently guide them in an indoor environment (Tan et al., 2019; Khandelwal et al., 2015). Another application of this team size is seen in studies like Emotional Storytelling in the Classroom (Leite et al., 2015), where multiple robots team up to execute a storytelling task in front of a student. Several systems have employed a team of robot actors to have more effective storytelling or drama (Swaminathan et al., 2021; Murphy et al., 2011).

## 3.1.3. Multi-human - single-robot

Collaborative systems with multiple human teammates interacting with a single robot have traditionally been used in applications like search and exploration and operating unmanned aerial vehicles (UAVs) where several humans collaborate among themselves to manage a robot (Bruemmer et al., 2005; Murphy, 2004). In such settings, humans take different roles (e.g., pilots, sensor operators etc.) and manage separate components of the robot's operation (Murphy et al., 2008; Drury et al., 2006). These applications have been important in scenarios when a single human is unable to manage the robot, or when robot failure can result in critical degradation of human's ability to supervise multiple robots (McCarley and Wickens, 2005). There are also several systems in which a robot is deployed in an environment with multiple humans it can interact with, either to assist them (Claure et al., 2020; Carlson et al., 2015) or to get assistance itself (Rosenthal et al., 2012). Another interesting application of such team size is seen in systems where a robot is used as a resource distribution agent within a team of multiple humans (Jung et al., 2020; Claure et al., 2020), or as a moderator

<!-- image -->

<!-- image -->

(a)

<!-- image -->

(c)

Figure 3: Different possible configurations of multi-agent HRI systems based on team size. a) A robot in a social environment (Tseng et al., 2016); b) Investigating effects of multiple robots on a human participant (Podevijn et al., 2016); c) Three robots interacting with each other and with the people around them (Fraune et al., 2020).

in a group interaction setting (Short and Mataric, 2017; V´ azquez et al., 2016).

More recently, with increasing research in social robotics, we see a rapid emergence of robotic systems in classroom and public settings. Starting with triadic interactions (Salam et al., 2016; Wainer et al., 2014; Johansson et al., 2013), research has presented robots collaborating with multiple users in applications like autism therapy (Kim et al., 2013; Kozima et al., 2005; Dautenhahn, 2003), education (Fern´ andez-Llamas et al., 2020; Chandra et al., 2016; Tanaka et al., 2015), interactions in public spaces (Fortunati et al., 2018) and other multiple human-robot social interactions (Nanavati et al., 2020; Foster et al., 2012). For a more detailed review of systems with robots interacting with a group of humans, readers are encouraged to refer to (Sebo et al., 2020).

## 3.1.4. Multi-human - multi-robot

This category consists of systems having both more than one human and more than one robot in the team. This team size results in systems that are possibly the most challenging to understand and control due to an exponential growth in uncertainty and the number of states that the system can be in (Dahiya et al., 2022). Such systems have been used in military applications where a human team consisting of members of different roles (pilots, supervisor etc.) needs to coordinate with a team of (semi)autonomous robots to achieve task goals (Ramchurn et al., 2015; Freedy et al., 2008).

Multi-human -multi-robot team composition is also seen in applications like search-and-rescue tasks (Kruijff et al., 2014; Lewis et al., 2011; Lee et al., 2010) and other tasks involving supervisory control of heterogeneous human-robot teams (Patel and Pinciroli, 2020; Driewer et al., 2007; Bradshaw et al., 2004).

Multi-human - multi-robot teams have also been discussed in several theoretical/computational studies addressing the task allocation and operator scheduling problems (Dahiya et al., 2022; Lippi and Marino, 2021; IJtsma et al., 2019; Malvankar-Mehta and Mehta, 2015). These studies discuss different methods for solving robot control, or assisting human decision-making in a multi-agent setting. As an example, Mina et al. (2020) present a framework for adaptive workload allocation based on agents' health conditions and work performances. Hari et al. (2020) present an approximation algorithm for task scheduling and sequencing, so that humans and robots are able to work on those tasks together when necessary. Another application area for such systems is seen more recently in social interaction settings where robots are included as social partners in a group of humans to understand the socio-emotional aspects of such teams (Oliveira et al., 2020; Correia et al., 2018; Iqbal and Riek, 2017), and in a classroom setting enabling group learning among students (Alves-Oliveira et al., 2019; Leite et al., 2016, 2015).

## 3.2. Team Composition

Besides the number of humans and robots, another important characteristic of a human-robot team is the team composition. This pertains to the aspect of homogeneity (or lack thereof) among the agents, be it humans or robots. Homogeneity in the case of robots may simply indicate whether there are different types of robots present in the team, with different hardware design (Sellner et al., 2006), manipulation capabilities (Karami et al., 2020) or interaction interfaces (Kruijff et al., 2014). In the case of humans, presence of different roles, capabilities or authority can introduce heterogeneity among team members (Chandra et al., 2016; Gombolay et al., 2015).

Homogeneous team composition is commonly seen in applications where agents are primarily identified as a part of a group rather than individually, such as robots in a swarm (Kolling et al., 2015), or humans in a crowd (Chen et al., 2019; Fortunati et al., 2018). Applications concerning a group of robots navigating among humans are also mainly making use of homogeneous robots, as shown in Fig. 4(a) (Batista et al., 2020). In the system presented in Fig. 3(b), the authors investigate the effects of multiple robots on the human participant. Since the study focuses on the number of robots instead of their individual identities, homogeneous robots are used in the system (Podevijn et al., 2016). In addition to this, homogeneous robots are also useful in applications where each robot is

(a)

<!-- image -->

(b)

<!-- image -->

<!-- image -->

(c)

<!-- image -->

(d)

Figure 4: Human-robot teams with different composition of agents. a) Multiple homogeneous robots aimed to socially navigate around a human (Batista et al., 2020); b) A human operator working with two heterogeneous robots in a product inspection task (Karami et al., 2020); c) A robot (in the centre) aiming to provide emotional support during interaction between two humans (Erel et al., 2021b); d) Two humans in a role of a leader and an assistant, working with a robot co-leader on a fetching and building task (Gombolay et al., 2017).

required to work on similar tasks, such as object transfers in warehouse operations (Rosenfeld et al., 2016). In most social HRI applications, humans are present as equal members of the group and are thus considered homogeneous in their composition, e.g., Foster et al. (2012); Erel et al. (2021b). Heterogeneous robot teams are seen in applications where robots with different manipulators or mobility are required, e.g., Karami et al. (2020); Tan et al. (2019). In military applications, teams of heterogeneous humans (having different roles, responsibilities and authority) have been used to control one or more complex robots, e.g., Freedy et al. (2008).

a homogeneous team, therefore, the literature offers a variety of studies presenting efficient control strategies for the control of heterogeneous multi-robot teams (Saribatur et al., 2019; Rosa, 2018; Saribatur et al., 2014).

Homogeneity in a human-robot team greatly influences the type, level and efficiency of the interactions (Wang et al., 2009). For instance, when managing a team of robots, the human operator needs to put in more interaction effort when the team is homogeneous as compared to the one with heterogeneous robots (Lewis, 2013; Goodrich et al., 2005), and this may lead to an increase in perceived workload and decrease in situational awareness (Adams, 2009; Humphrey et al., 2007). Having homogeneous robots may also lead to a simpler interaction interface (Yanco and Drury, 2004). However, heterogeneous robot teams may allow to use specific robot capabilities to carry out a variety of operations (Suh and Woo, 2009; Sellner et al., 2006). It is generally agreed that control and operation of a team of heterogeneous agents is more demanding compared to

Likewise, when more than one human is involved in the interactions, differences or similarities among the humans may govern system dynamics (Wang and Lewis, 2008). One common source of introducing heterogeneity among human teammates is through associating different roles with different humans (Li et al., 2021; Kruijff et al., 2014). Earlier studies have presented a taxonomy with different roles (Supervisor, Operator, Peer, etc.) that a robot can assume in an HRI system (Scholtz, 2003) and similar roles can also be assigned to human teammates. There are several systems presenting industrial-oriented tasks that involve a team of heterogeneous humans in roles of supervisors and assistants (Gombolay et al., 2015; Murphy et al., 2008; Drury et al., 2006). This team composition has also found applications in social/educational robotics, e.g., in systems enabling robots to engage with users from different generations (Joshi and ˇ Sabanovi´ c, 2019; Short et al., 2017), or when humans have distinct roles/jobs in the group (Taheri et al., 2018).

## Key Observations

Considering these team characteristics, it is observed that a substantial amount of work has been done on one-

to-many systems (single-human - multi-robot and multihuman - single-robot), while systems with both multiple humans and multiple robots are emerging (mainly in the planning and task scheduling literature). As collaborative robots are envisioned to work together with and among humans, it is essential to further the research that facilitate robots' interactions in systems with a larger number of humans.

Under all types of team sizes, we find systems where humans play different roles in reference to the robots; from supervisors of multiple robots to peers in a social group to information seekers in a public setting. Robots also show such variety of roles themselves. In the past, multiple humans were required to control a single unmanned vehicle, especially in safety-critical settings (e.g., military use). However, with advancements in robot autonomy, such systems are fading away from the literature and more systems have emerged where a single human is able to supervise a large number of robots. Human supervision of multiple robots has its own challenges, such as limited human cognitive capabilities, increased workload and decreased situational awareness. In later sections, we discuss several studies working towards addressing these issues, including designing of decision support systems, improving robots' motion legibility and predictability, and building intuitive interaction interfaces. We expect more such studies to emerge as HRI systems with large number of agents are becoming increasingly popular. Even though we find many examples of research on systems with heterogeneous robots, implementation of such systems in a social setting or in human-assisting applications is still limited.

## 4. Interaction Style

In the context of human-robot systems, interaction style is a broad term that can be used to refer to different aspects of interaction such as the modes of communication among agents (Feine et al., 2019), interaction models (Fortunati et al., 2018) and the interaction interface (Gromov et al., 2016; Rule and Forlizzi, 2012). Interaction style also includes the method of communication (verbal/nonverbal) (Mavridis, 2015; Stiefelhagen et al., 2004), expression of affect (Schermerhorn and Scheutz, 2011) and spatial relationship (Chen et al., 2007; H¨ uttenrauch et al., 2006).

Interactions in multi-agent human-robot systems differ from those in dyadic systems in three ways:

1) First, with multiple agents present, it is possible to have interactions within a group of agents, viz. Human-Human Interactions, e.g., Kruijff et al. (2014) and Robot-Robot Interactions, e.g., Williams et al. (2015).

2) Second, in addition to the one-to-one interactions seen between humans and robots in dyadic HRI systems, oneto-many interactions are also realizable in multi-agent HRI systems, e.g., Fortunati et al. (2018).

- 3) Third, in multi-agent systems, there are additional types of interactions possible among the agents. Patel

et al. (2021) discussed the differences between 'direct' and 'indirect' interactions between two humans in a system in which they can communicate either via verbal communication or through the interface. Che et al. (2020) investigated the role of 'explicit' and 'implicit' communication in social navigation. Abrams and Rosenthal-von der P¨ utten (2020) used the terms 'the group' and 'the observer' to distinguish the two perspectives of measuring group cohesion in a multi-agent HRI setting.

Such distinctions of interaction types are useful to better understand HRI systems and can be applied to improve the interaction outcome. In the context of multi-agent HRI systems, we find it useful to distinguish 'direct' and 'indirect' interactions. Direct interaction between a sender and recipient occurs when the sender actively (i.e., intentionally and explicitly) communicates to the recipient using any mode, verbal or non-verbal. A third party may also receive information from this direct communication, and we refer to this 'eavesdropping' as indirect interaction. Taking the example of a study presented in Tan et al. (2019), a robot trying to transfer task information to another robot (e.g., through speech) is an example of direct interaction, whereas a human observing the robots interacting with each other is an example of indirect interaction.

Making this distinction can help us understand if and how much the agents in the system are actively trying to communicate with each other, or if they are primarily just co-existing in the same environment while observing each other. Modelling the indirect interactions is necessary to understand how direct interactions between any two agents can affect the behavior of others, e.g., Rifinski et al. (2021); Tan et al. (2019); Short and Mataric (2017).

Examples of systems with different types of interactions are given in Table 1. In the remainder of this section, we discuss the above-mentioned differences under two key aspects of interaction style: 1) Interaction model present, and 2) Communication modalities used in the system.

## 4.1. Interaction Models

The presence of multiple agents doesn't necessarily mean that each interaction in the system also involves multiple agents. Interactions in a multi-agent HRI system can be implemented using different-sized communication channels (i.e., number of agents connected via a single interaction). In a multi-agent system, an agent has the capability of interacting with one or multiple agents at the same time, resulting in one-to-one or one-to-many interaction models (Fortunati et al., 2018). Examples of several multi-agent HRI systems with different interaction models are shown in Table 1 using the interaction graph structure defined in Section 2. These interaction models are an extension of the interaction types presented in Yanco and Drury (2004). The presented models are able to include a variety of interaction models that are possible in multi-agent HRI systems, especially in social interactions. In addition, these models let us specify the direction of information flow between agents in the system and can

<!-- image -->

<!-- image -->

<!-- image -->

<!-- image -->

<!-- image -->

<!-- image -->

<!-- image -->

Table 1: Examples of multi-agent human-robot systems with respective interaction graphs. The orange solid arrows signify direct interactions between agents (e.g., explicit communication), while dashed blue arrows denote the indirect interactions (observational). A maximum of two humans and robots are shown in the figures but interaction graphs can be drawn for larger number of agents in a similar way. Note that this table only shows examples of possible interaction graphs and is not an exhaustive list.

|     | Study                                        | Interaction Graph   | Remarks                                                                                                                                                                                                                                                                                                            |
|-----|----------------------------------------------|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| (a) | Short and Mataric (2017); Jung et al. (2020) |                     | Robot actions are independent of human teammates. Humans decide their actions based on robot's actions and potential discussion with the other human team- mate.                                                                                                                                                   |
| (b) | Claure et al. (2020)                         |                     | Robot actions are independent of human teammates. Humans decide their actions based on robot's ac- tions, and observing actions of the other human teammate (without an explicit communication).                                                                                                                   |
| (c) | Tan et al. (2019)                            |                     | A human user directly interacts with one of the two robots, which then conveys the required information to the second robot while the user observes the two robots.                                                                                                                                                |
| (d) | Swamy et al. (2020); Rosenfeld et al. (2017) |                     | A human operator controls each robot individually. Human's selection of robot is decided either by ob- servation of robots' states or based on the suggestion shown by the interface (G).                                                                                                                          |
| (e) | Dahiya et al. (2022)                         |                     | A central computational entity (G) collects data of all robots and then allocates each human operator to the robots that need assistance. Operators directly assist each robot individually. Operators are allo- cated only by the central computational entity and not by any direct interaction from the robots. |
| (f) | Patel and Pinciroli (2020)                   |                     | Commands of multiple human operators are con- verted to actions required by individual robots. Op- erators observe each robot directly and can also in- teract with each other to resolve conflicts.                                                                                                               |
| (g) | Correia et al. (2018)                        |                     | Two teams, each consisting of one human and one robot, play a competitive card game. Each agent de- cides on their actions based on observations of other agents' actions. The only direct communication in the system occurs when a robot speaks to its part- ner to convey its emotions.                         |

Figure 5: Example of a one-to-one interaction model in a single-human - multi-robot system (Yedidsion et al., 2019). Multiple robots connected via a network to guide a human user who interacts with the robots located in different locations.

<!-- image -->

<!-- image -->

<!-- image -->

illustrate differences between human-to-robot and robotto-human interactions.

Note: These models, in their basic form (Fig. 1), do not describe many aspects of the complete system (e.g., they don't show embodiment, roles or heterogeneity among agents and modes of communication etc.). However, it is possible to add some of this information with slight modifications. In Table 1, we show one such additional information, i.e., whether an interaction is direct or indirect using solid and dashed arrows respectively.

mented both with a one-to-one and a one-to-many interaction model. One common strategy under the one-to-one model is the leader-follower approach where human users directly control the leader robots, and the control signals for rest of the swarm are governed by the leaders' motions (Setter et al., 2015). Human-swarm interaction can also be designed to follow a one-to-many interaction model as discussed later in Sec. 4.1.2.

As observed from Table 1, interactions between different pairs of agents in a system can be implemented using different models (one-to-one, one-to-many etc.). It is also possible that the interaction model varies with the type of agents involved, i.e. different models for human-to-robot, robot-to-human, robot-to-robot and human-to-human interactions. For instance, a human may be able to give commands to a single robot at a time (one-to-one) but may observe behavior and receive messages/information from all robots at once (one-to-many). (Swamy et al., 2020; Khasawneh et al., 2019). These models are discussed below under three categories denoting how many agents are connected at each end of the communication channel.

## 4.1.1. One-to-one

The one-to-one interaction model is similar to the one found in dyadic human-robot systems, where one robot is interacting with one human teammate at any given time, even though there may be other agents present in the environment. This form of interaction is commonly seen in systems where one can afford the commands/instructions for an agent to be independent from others (e.g., in systems where coordination among agents is not required). A common application is seen in systems where a human operator manages a team of multiple robots by giving each of them separate commands. While a fleet of autonomous robots acts in separate environments, a human operator can monitor their states remotely, and can intervene to help a robot via teleoperation when the robot encounters a challenging state (Swamy et al., 2020; Rosenfeld et al., 2017).

Human commands to a robot swarm can be imple-

One can easily identify the interaction model by checking if an interactive action of an agent is targeted at one or multiple agents. For instance, in the system shown in Fig. 1, even though the robot interacts with multiple humans throughout the duration of its task, at any given point the robot's actions are directed towards a single human (Jung et al., 2020). Figure 4(b) shows a system consisting of two robots with a human co-worker. In this system, the human interacts with each robot separately and none of the interactions involve all three agents at once (Karami et al., 2020). We also see examples of such HRI systems in an office setting, where even though multiple humans or robots are present, each interaction is between a single human and a single robot. An example can be a system with a robot going around in an office asking help from human users at different times (Rosenthal et al., 2012). In Fig. 5, we see a person being guided from one robot to another by an indoor guidance system, where the human interacts with only a single robot at a time (Yedidsion et al., 2019).

In settings where agents are located in the same environment, the system also has indirect interactions, occurring when an agent observes other agents and interactions are happening between those agents. These indirect interactions are still following a one-to-one model as information flow from each agent is directed towards the observer.

## 4.1.2. One-to-many

The one-to-many interaction model can be seen in situations where one robot is directly interacting with multiple humans at once (Nanavati et al., 2020; Correia et al., 2018; Alves-Oliveira et al., 2019), and where one human is simultaneously interacting with multiple robots (Lim et al., 2018; Guo et al., 2009; Arkin and Ali, 1994; Jones and

(a)

<!-- image -->

<!-- image -->

(b)

<!-- image -->

(c)

Figure 6: Examples of different interaction models: a) One-to-many: A system where a human controls multiple robots at once using hand gestures (Kim et al., 2020). The gestures are interpreted and translated into group commands for the robots. b) One-to-many: A robot playing a table-top game where it interacts with two human teammates (Alves-Oliveira et al., 2019) c) Many-to-many: Multiple humans giving commands to multiple robots using a tablet interface (Patel and Pinciroli, 2020), where each human operator can give commands to multiple robots simultaneously. Any form of conflict in commands is then resolved through the interface or verbal communication among the operators.

Snyder, 2001). Note that while some works distinguish between many-to-one and one-to-many models based on the direction of control commands (e.g., (Lim et al., 2018)), in this article all such systems are discussed together under a one-to-many model.

A simple way of implementing this interaction model is to use verbal communication or gestures to issue group commands to multiple agents at once, e.g., Kim et al. (2020). Consequently, interactions under this model are common in social settings where it naturally finds its utility as the robots and humans are co-present in the same environment. Studies such as (Nanavati et al., 2020; Correia et al., 2018) present systems where robots' actions and behavioral cues are directed towards the whole group of human teammates. Another similar application of this model is seen in classroom settings where a robot acts as a teacher/tutor for a group of students to promote groupbased learning (Leite et al., 2015; Alves-Oliveira et al., 2019; Belpaeme et al., 2018). A robot interacting with humans in public space also naturally calls for a one-to-many model (Fortunati et al., 2018).

In industrial or military-oriented applications, where humans and robots might be located in separate environments, interactions with one-to-many model are usually implemented using a group command node (see interaction graphs in Table 1; shown as node (G)). This group command node is an interpreter - an interface or an algorithm - that converts communication from one or more agents to information required for each individual agent before relaying that information to its recipient(s). For example, when controlling multiple ground or aerial robots, a human operator can simply give group commands to the fleet instead of telling each robot what to do, while the interpreter converts this group command to required actions for each robot (Lim et al., 2018; Ayanian et al., 2014). As mentioned earlier, implementation of human interactions with a swarm can also make use of one-tomany model. In order to make human control efficient, a group command node is usually implemented to convert human intentions/commands into control signals for all robots (Kolling et al., 2015; Podevijn et al., 2013).

## 4.1.3. Many-to-many

In multi-human - multi-robot systems, it is also possible to have multiple one-to-many interactions taking place among different agents, resulting in a many-to-many interaction model. This is the most unconstrained interaction model that can be implemented in a human-robot system. Although it is not very common to find such interaction models in the literature, studies by Patel et al. (2021); Zhang and Vaughan (2016); Tews et al. (2003) have presented interaction interfaces to facilitate many-to-many interactions.

A common way of enabling interactions among the agents in such systems is through a proxy architecture aimed at facilitating collaboration of humans and robots with varying levels of autonomy (Mostafa et al., 2019;

Ramchurn et al., 2015; Freedy et al., 2008). Under this setting, each agent interacts with a proxy, which can be part of a centralized proxy architecture or connected to a network of proxies. Such proxies are usually responsible for receiving inputs from different agents, interpreting the messages/commands and then relaying the relevant information to other agents. Augmented Reality (AR) based interfaces have also been used for making such interactions more user-friendly by providing easier methods of giving commands. For example, the system shown in Fig. 6(c) uses an AR interface on separate tablets to enable multiple users interact with multiple robots (Patel and Pinciroli, 2020). Without such proxies, the human user in the system must resolve any conflicts (in commands/decisions) by themselves and work out a common strategy (Hwang and Wu, 2014).

Looking at the interaction graphs in Table 1, manyto-many interactions can be represented as a group node (G) having multiple incoming and multiple outgoing edges. For example, in Table 1(e), a central computational entity - shown as group node (G) - collects data from multiple robots and relays allocation information to multiple human operators. In Table 1(f), the interface enables multiple users to input control commands and then decides actions for individual robots. A many-to-many model provides a natural and efficient interaction setup in a multiagent system as there are minimal constraints on when agents are allowed to communicate with each other, and information from multiple agents can be relayed to respective recipients simultaneously.

## 4.2. Communication Modalities and Interfaces

Communication modalities refer to the modes through which different agents interact in the system (e.g., speech, haptics, screen etc.). Similar to the conventional dyadic HRI systems, a multi-agent system can have agents communicate through verbal or non-verbal modes, or a combination of multiple modes at once. The chosen modes of communication depend on the environment, application of the system, proximity of different agents and their interaction capabilities. Each modality has its own advantages, implementation requirements and limitations. As we note below, proximity has a major influence on which communication modalities are feasible in a system to enable efficient communication among humans and robots. Therefore, to discuss the different types of communication modalities used in multi-agent systems, we find it useful to group the communication modalities based on proximity of the agents.

## 4.2.1. Remote communication

For the systems designed to operate in environments that can be potentially dangerous or inaccessible to humans, a screen-based interface is ubiquitous for enabling remote communication among humans and robots. Human control of a fleet of UAVs (Nam et al., 2017; Drury

Figure 7: Examples of interface design for control of multiple robots: a) Interface of a search-and-rescue task with multiple robots (Rosenfeld et al., 2017). Such screen-based interface designs are common in applications where a human user controls multiple remote robots. b) A mixed-reality based multi-robot control framework. The figure shows relation of coordinate frames of main entities of the system, including one human and three robots (Ostanin et al., 2021). c) Teleoperation of multiple mobile robots using a joystick and a haptic device (Hong et al., 2017).

<!-- image -->

<!-- image -->

<!-- image -->

et al., 2006), supervision during search and rescue tasks (Khasawneh et al., 2019; Scholtz et al., 2004; Murphy, 2004) and teleoperation of underwater robots (Wang et al., 2014) are some of the applications that make use of remote communication techniques.

Remotely interacting with robots is shown to result in higher cognitive workload in human users (Gittens, 2021) that increases further with an increasing number of robots (Fincannon et al., 2013; Adams, 2009). Furthermore, communicating plans to team supervisors and maintaining their situational awareness are important challenges faced while collaborating with remote robots (Hastie et al., 2019). Therefore, the development of interaction interfaces that enable efficient and reliable human interaction with multiple remote robots is an important area of research in remote HRI (Rold´ an et al., 2017; Rule and Forlizzi, 2012; Driewer et al., 2007). In systems with a disproportionately high number of robots compared to human operators/supervisors, it becomes difficult for the human users to interact with all robots efficiently due to challenges related to perception, workload and situational awareness (Fincannon et al., 2013). Therefore, intelligent interface designs are required that can mitigate these challenges (Hussein et al., 2018). So far, the literature mostly consists of screen-based interface designs as they are most convenient to implement in such systems as seen in many of the studies discussed in 3.1.2. Conventionally, many of the multi-robot interfaces have several common design features such as camera feeds of multiple robots in small cards along the screen edge, an enlarged view of one selected robot and a map, or an overview, of all robots in the environment (Rosenfeld et al., 2017; Chien et al., 2018) as shown in Fig. 7(a). Recently, immersive interfaces using Virtual Reality (VR) or Augmented Reality (AR) have gained popularity in multi-robot systems (Ostanin et al., 2021; Rold´ an et al., 2017), and have been shown to improve operator's situational awareness and decreases workload (Frank et al., 2017). Some systems also augment the interaction interface with multiple communication modes (e.g., haptics) to improve interaction outcome (Hong et al.,

2017) (Fig. 7(c)).

## 4.2.2. Proximate communication

In systems where humans and robots are located in the same environment, many more options are available to system designers in terms of choosing modes of communications. These include communication via auditory channels (speech and non-speech audio), visual channels (gestures, facial expressions, body postures and gaze), and physical channels (touch and force). The human guidance system presented by Yedidsion et al. (2019) (shown in Fig. 5) uses simple verbal instructions to communicate with the human. Figure 6 shows a human user controlling a robot swarm using gestures (Kim et al., 2020).

It is also possible to use multiple modes at once, resulting in a multi-modal communication architecture. Pourmehr et al. (2014) present a system that uses a combination of haptic and verbal inputs from a human user to control multiple UAVs. In the system presented by Gromov et al. (2016), a human user can communicate with robots using speech and gestures, while robots provide visual and verbal feedback to the user.

Proximate communications are more commonly seen (and naturally present) in social robotics as it facilitates more personal interactions required in a social setup. This is expected as the meaning conveyed through an interaction significantly depends on social cues (verbal and non-verbal), and not solely on the concrete message being communicated (Feine et al., 2019). When humans and robots are co-located in the same environment, the role of non-verbal communications becomes important to consider. For example, the robot shown in Fig. 4(c) is used to enhance Human-Human Interaction using only gestures, without interfering with the interaction (Erel et al., 2021b). There have been several other studies that investigate different cues (expressions and movements) as a mode of communicating a robot's intent, emotions and information more clearly in a group (Correia et al., 2018; Faria et al., 2017). Moreover, the spatial placement of agents, in relation to each other and the environment, also

affect human behavior in a group setting (Rios-Martinez et al., 2015; Yamaoka et al., 2009).

When the agents are located in the same environment, it is also likely that interaction between any two agents has an effect on behavior and future interactions of other agents. This is an example of indirect/implicit communication between agents. Thus, Human-Human Interactions can be affected by Human-Robot Interactions (Joshi and ˇ Sabanovi´ c, 2019; Short et al., 2017; Kim et al., 2013). Likewise, Robot-Robot Interactions can also impact future human interactions with the robots (Yang and Kwon, 2012; Tan et al., 2019), or can affect a human's psychological state (Erel et al., 2021a). Moreover, the embodiment or mere presence of a robot may influence human behavior and their perception of the robots (Druckman et al., 2021; Shiomi et al., 2020).

Communication modalities can also vary depending on the agents involved and the direction of communication. For instance, Berg et al. (2019) present an interaction system where the human to robot communication channel is realized through gestures and eye tracking while the information from the robot to the human is communicated using a projection. In the system presented by Rosenthal et al. (2012), the robot communicates its queries using speech and receives human input using a visual interface on a laptop. Also, it is common to see different communication modalities between Human-Robot and HumanHuman Interactions (Kruijff et al., 2014).

## 4.3. Key Observations

In the surveyed literature, we looked into different forms of interaction styles present in multi-agent human-robot systems. In terms of interaction models, a couple of trends are noticed. Even with multiple agents present in the system, the one-to-one interaction model is the most prevalent, signifying that agents mostly interact with only one other agent at a given time. The one-to-many interaction model has also gained popularity and has been useful to enable more efficient interactions in multi-agent systems. Such systems are mostly seen in the area of social robotics and co-located settings. In order to realize more natural and smooth interactions among humans and robots, a many-to-many model is desirable. We are beginning to see such implementations and hopefully this will mature into a strong interaction model in human-robot systems.

In regards to communication modalities, the systems where humans are interacting with multiple robots, using a screen-based interaction interface is the most common (and possibly the most practical) one. With the advent of Virtual Reality (VR) and Augmented Reality (AR) technologies, the future multi-agent Human-Robot Interaction systems may shift from conventional two-dimensional screen-based interfaces to more immersive ones. It is still unclear whether using VR and AR will improve system usability in diverse settings and more research is required to investigate the applicability of these technologies in multiagent HRI systems for different applications. In co-located and social settings, multi-modal communication is becoming more popular and can enable more natural and safer interactions between humans and robots, without the need of enforcing physical separation.

## 5. Computational Characteristics: Robot Control

So far in this article, we have discussed the perceptible aspects of an HRI system, which describe 'who is present in the system and who is interacting with whom'. Now, we look into computational aspects of the system, specifically how system designers can choose to influence/control the behavior of different agents in the system. In an HRI system, behavior and actions of robots are controlled directly, either governed by an optimization-based action-policy or via predefined/rule-based methods, or a combination of both. On the other hand, human behavior can only be influenced indirectly using robots-via robots' actions, their interactions with the humans, other robots or the environment, or by explicit communication. Some systems make use of human behavioral models to decide robots' actions while others take a model-free approach.

In this article, our main focus is on multi-agent systems instead of the particulars of a study. Therefore, even though both human behavior and robot control are relevant attributes under the aspect of computational characteristics, in this section we primarily discuss robot control, and only include a discussion of human behavior where required. Specifically, we look into the two principal types of robot control implemented in multi-agent HRI systems: 1) Optimization-based control, and 2) Pre-defined and Rulebased control. We also discuss the differences that the presence of multiple agents brings to the system.

## 5.1. Optimization-based control

Optimization-based control refers to a framework of computing robot actions to optimize one or more performance-defining parameters established for the system. The performance parameters often represent factors like time of task completion (Hari et al., 2020), cost incurred (resources spent) (Dahiya et al., 2022) and reward earned (value produced) (Swamy et al., 2020). In the multi-agent HRI literature, to pose the mathematical optimization problem, we find examples of systems being modeled in the form of time-series (Wang et al., 2014), outcome probabilities (Dahiya et al., 2022; Sellner et al., 2006) or Dynamic Bayesian Network (Fooladi Mahani et al., 2020). Machine-learning and other data-driven methods are also some of the tools used in optimization-based control, e.g., Nam et al. (2019); Swamy et al. (2020); Li et al. (2021). Such system models are often motivated by literature from different areas of behavioral study such as psychology, economics and social sciences. For example, Shannon et al. (2016) present the Pew model from psychology, Swamy et al. (2020) make use of the Luce Choice model from economics, and Bera et al. (2018a) use entitativity related psychology research in their system.

Optimization-based control is seen in studies where the system behaviour can be modelled reliably using existing theories, or where researchers are trying to validate a new approach for the same. These systems also require that there exist quantifiable and measurable parameters that can be used as optimizing metrics (such as task completing time, error rate, etc.). When implementing such robot control in multi-agent systems, there are a few considerations to handle. When multiple agents are present in the system, the required information access and the possibility of interactions may increase exponentially with the number of agents. This problem has motivated a whole segment of research on the development of computationally efficient decision-making and control techniques for multi-agent HRI systems. Among other applications, this research is seen in systems enabling a robot influence a team of multiple humans (Li et al., 2021), enabling multiple robots safely navigate among other robots and humans (Bajcsy et al., 2019), predicting human behavior while supervising multiple heterogeneous unmanned vehicles (Boussemart and Cummings, 2011), and finding task allocation and sequencing for multiple robots travelling to collaborate with humans (Hari et al., 2020). When dealing with a large number of robots, human users' ability to maintain awareness of the system's state might be insufficient (Olsen Jr and Wood, 2004; Chien et al., 2013) and thus the users may benefit from a decision support system (DSS). Applications of such DSSs are seen in systems enabling a human operator to assist multiple remote robots (Swamy et al., 2020), and in allocating operators to multiple navigating robots (Dahiya et al., 2022; Rosenfeld et al., 2017; Malvankar-Mehta and Mehta, 2015). There are also several studies on human supervision of fleets or swarms of remote robots. Lewis (2013) presents a review of systems enabling such supervision under the construct of command complexity, while Kolling et al. (2015) review research on human control of robot swarms.

## 5.2. Predefined and rule-based control

The predefined and rule-based control methods provide a convenient way of implementing robot control in an HRI system without the use of computational models or data. This includes techniques like Wizard of Oz, expert knowledge-based if...then rules or pre-specified sequences of actions. These methods help simplifying the robot's decision-making and allow the researchers to focus on other aspects of Human-Robot Interaction that the system is designed to investigate, e.g., the outcome of a controlled interaction (Tan et al., 2019; Yang and Kwon, 2012).

In some systems, direct use of an optimization-based control is not possible (e.g., due to lack of a numerical model), and a subjective interpretation of system events is required based on expert knowledge. Such systems often have robot action policies implemented as if...then rules instead of numerically computed conditions. For example, Correia et al. (2018) present an algorithm to generate robot emotions given system events, based on knowledge from psychology. Rule-based control is more commonly seen in studies that deal with subjective metrics of outcome (e.g., human perception of robots, workload, etc.). It is to be noted that the optimization-based and rule-based control methods are not mutually exclusive. It is possible to implement a combination of an optimizing policy with an expert knowledge-based model. For example, in the system shown in Fig. 6(b), the robot's behavior is decided by a hybrid controller that combines manually-encoded behavioral rules and a machine learning-based mapping function Alves-Oliveira et al. (2019).

While the use of a rule-based control can enable a system to decide the robot's behavior based on some feedback from the environment, it is also possible to pre-specify robot behavior without any decision-making component. This robot control method facilitates studies to manipulate different test conditions where feedback from the environment is not required. Such implementation is commonly seen in studies which investigate the effects of specific robot's behaviors on human users, e.g., users' perception of robots (Fraune et al., 2017), knowledge acquisition (Fern´ andez-Llamas et al., 2020), group emotions (Bera et al., 2018b) and perceived legibility (Faria et al., 2021). Referring back to the interaction graphs defined in Section 2, this type of robot control results in unidirectional edges from robots to humans.

Under predefined and rule-based control methods, the Wizard of Oz (WoZ) is a common technique of implementing human-assisted robot control in multi-agent (mainly multi-human) systems. In this technique, a hidden human 'Wizard' (teleoperator) controls some or all of the robots' actions, speech, gestures and behavior, unknown to other human users in the system (Riek, 2012). Depending on the level of robot autonomy, the Wizard can be used to replace certain parts of the robots' perception or cognitive capabilities (e.g., Johansson et al. (2013); Shiomi et al. (2009)), thus overcoming the robots' limitations. It is also possible to have a mixed-initiative approach where either the robots or the human user can take control of the robots' actions (e.g., Jiang and Arkin (2015); Khasawneh et al. (2019); Wang et al. (2014)).

Note on Group/Individual behavioral parameters: Regardless of the type, robot control in multi-agent systems can be implemented in two ways: either based on parameters of individual agents or based on the group as a whole. Taking the parameter of trust as an example in a system with a single human and multiple robots, one can either plan robots' actions by considering trust of the human in each individual robot (Wang et al., 2018), or one can consider the human's trust in the whole robot team (Liu et al., 2019). Other examples of robot control based on individual parameters can be seen in studies with parameters like engagement (Leite et al., 2016) and attention (Yang and Kwon, 2012). Such individual modelling provides a simple method to expand dyadic HRI research to the multi-agent setting, and to test any differences between

the two. Group-based robot control has its own benefits. Often, the humans in the system are not working independently of other agents. So, one may find it useful to decide robot control as a function of group-based parameters, something not possible in dyadic systems. Such control is particularly useful in systems where agents act as a coordinating team, or in applications where performance or behavior of the whole group is central. For example, a robot can express group-based emotions to increase its likability among human teammates (Correia et al., 2018), or use audio features to affect group entitativity (Savery et al., 2021). Figure 3(a) shows a system where the robot determines the best way of approaching a group of humans by extracting social cues shown by the group (Tseng et al., 2016).

## 5.3. Key Observations

In this section, we looked at two types of robot control, one that aims to optimize a performance parameter and another that is defined using expert knowledge.

Regarding the implementation of robot control, the Wizard of Oz technique is still the most widely used control method for studies designed to evaluate a human's response to a particular type of interactions with a single robot (or a limited number of robots). The prevalence of the Wizard of Oz technique is an indicator that a majority of the research in multi-agent Human-Robot Interaction pertains to investigating human behavior in such systems.

Optimization-based robot control has mainly been a part of computational research in HRI and has enabled the development of efficient algorithms for large systems. Even though we find a good amount of papers implementing such optimizing control policies on real systems, most of the work in computational research in multi-agent HRI is still seen in theoretical studies. This is expected, as real world implementations and testing of large systems require significantly more resources. However, we hope that as robotic systems become more accessible, the computational research in multi-agent HRI will gain traction with real systems.

We also discussed how robot control in multi-agent systems can be based on parameters of either individual agents or of the group as a whole. While individual agentbased control enables the extension of dyadic research to the multi-agent case, group-based control allows to study and influence behavior of the whole group without requiring exact specifications for individual agents. Group-based control has also enabled more natural robot operation when interacting with a group of humans. In the domain of social robots, studies are becoming popular which try to improve understanding of behavioral parameters of the team/group as a whole. Future work in this area is essential in normalizing robot presence in human society.