\begin{figure}
    \centering

    \subfigure[HTN representation of the task.]{\label{fig:posing task htn representation}%
        \includegraphics[width=.65\textwidth]{Image/posing HTN Domain.jpg}
    }%
    \subfigure[A demonstration of the task environment in VR and the arrow manipulations.]{\label{fig:posing task scene arrow avatar}
        \includegraphics[width=.35\textwidth]{Image/scene_arrow_avatar_new.jpg}
    }
    % \begin{subfigure}{\textwidth}
    %     \begin{subfigure}{.65\textwidth}
    %         \includegraphics[width=\textwidth]{Image/posing HTN Domain.jpg}
    %         \captionsetup{width=0.8\textwidth}
    %         \caption{}
    %         \label{fig:posing task htn representation}
    %     \end{subfigure}%
    %     \begin{subfigure}{.35\textwidth}
    %         \begin{subfigure}{\textwidth}
    %             \includegraphics[width=\textwidth]{Image/scene overview.jpg}
    %             \captionsetup{width=.8\textwidth}
    %             \caption{}
    %             \label{fig:posing task scene}
    %         \end{subfigure}
    %         \begin{subfigure}{\textwidth}
    %             \includegraphics[width=\textwidth]{Image/arrow-overview.jpg}
    %             \captionsetup{width=.8\textwidth}
    %             \caption{
    %             % Interactive arrows attached to joints for manipulating the Puppet
    %             }
    %             \label{fig:posing task arrows avatar}
    %     \end{subfigure}
    %     \end{subfigure}
    % \end{subfigure}
    % 
    \subfigure[A summary of the implementation of the framework.]{\label{fig: posing task algo branch to var}%
      \includegraphics[width=.7\textwidth]{Image/branch-var-observation.JPG}
    }%
    % \begin{subfigure}{.7\textwidth}
    %     \includegraphics[width=\textwidth]{Image/branch-var-observation.JPG}
    %     \captionsetup{width=0.8\textwidth}
    %     \caption{
    %     % Intention variables are created only for non-trivial levels along the branch.
    %     }
    %     \label{fig: posing task algo branch to var}%         
    % \end{subfigure}%
    % 
    \subfigure[An illustration of cues.]{\label{fig:posing task cue capability}%
      \includegraphics[width=0.3\textwidth]{Image/cue-capabilities.jpg}
    }%
    % \begin{subfigure}{.3\textwidth}
    %     \includegraphics[width=\textwidth]{Image/cue-capabilities.jpg}
    %     \captionsetup{width=.8\textwidth}
    %     \caption{
    %     % Cuing capabilities of Sophia.
    %     }
    %     \label{fig:posing task cue capability}
    % \end{subfigure}
    % 
    \caption{Overview of the VR posing task and the implementation of the framework.}
    \label{fig:posing task task-overview}
\end{figure}

\section{Experiment}\label{sec: exp overview}
We designed experiments in virtual reality (VR) to test two hypotheses. First, a robot equipped with the proposed framework, acts as an effective helper and offers timely and precise guidance. Second, the performance of the proposed framework benefits significantly by incorporating gaze information.

\subsection{VR Posing Task}\label{sec:vr posing task}
The proposed framework was implemented in a VR posing task. Human agents were asked to move a puppet through a sequence of spatial locations, where the puppet should assume a different pose at each location. The task is motivated by an on-going collaboration with a local hospital seeking to use humanoid robots to guide patients through various check-in procedures, such as an entrance interview and measurement of vital signs.  
% \figureref{fig:posing task architecture} gives an overview of the system architecture underlying the task. 
\figureref{fig:posing task scene arrow avatar} shows the task environment, which consists of a puppet, a set of reference objects indicating locations and poses, and a virtual \href{https://www.hansonrobotics.com/sophia/}{Sophia robot}. The posing task is carried out by the human subject. Sophia serves as a guide, providing a description of the task at the start of each trial, and cues in the form of spoken utterances and body gestures (e.g. pointing) while the agent is performing the task. In \figureref{fig:posing task htn representation} the posing task is presented as an HTN. To complete the \emph{root task}, the agent needs to achieve five \emph{sub-goals}. Two tasks need to be completed for each sub-goal: translating the puppet to a desired location and controlling the puppet to mimic a reference pose. For ease of description, we treat both as \emph{pose mimicry} tasks. A pose mimicry task in turn requires the completion of a set of \emph{joint mimicry} tasks. Joint mimicry is done by executing a sequence of \emph{joint actuation task}s around different axes by different angles. The task ``Actuate'' at the joint actuation level is the action of this HTN which could be implemented by the task-related movement of \emph{arrow manipulation}. A set of interactive arrows similar to those in \citep{Leeper2012StrategiesGrasping} is attached to each joint and could be manipulated by the agent via the VR controller (\figureref{fig:posing task scene arrow avatar}). Every button press of the controller causes the selected arrow to actuate the joint around an axis by a fixed amount, hence an ``Actuate'' action requires repeated manipulations of the correct arrow. \figureref{fig:posing task cue capability} shows the cue Sophia might offer on each non-trivial task. For a more lively view of cues check \href{https://drive.google.com/file/d/10vEPwjNTcMf0Te-EIQa9q-jCrA8kvven/view?usp=sharing}{this demo video}. A demo of a human executing the posing task could be \href{https://drive.google.com/file/d/1197bPN0diBbWJ2VpSRCT5UiwdGJpPE5f/view?usp=sharing}{found here}. The VR posing task is implemented in Unity on a free personal license. The Unity-based assets we used are listed in \apdxref{apx:unity asset list}.

\figureref{fig: posing task algo branch to var} summarizes the implementation of the proposed framework for the VR posing task. A task-specific planner for the posing task is created (\apdxref{apx: impl planner}), consisting of a symbolic planner (for arranging tasks at sub-goal and pose mimicry levels) and a motion planner (for arranging tasks at joint mimicry and joint actuation levels). For estimating intention, we create the intention variables at three levels: the pose mimicry, joint mimicry and joint actuation levels (from highest to lowest level of the hierarchy). The relevant entities for the pose mimicry tasks are the puppet and the reference object. For lower-level joint mimicry (or joint actuation) tasks, the related entities are specific joints (or arrows) on the puppet and the reference object. See \apdxref{apx:impl state estimate} for more details. As discussed above, related entities at lower levels are subsets of the related entities at higher levels, leading to more spatially localized gaze models at lower levels. The guidance controller chooses from the set of verbal/gestural cues shown in \figureref{fig:posing task cue capability}.

\subsection{Experimental Protocol}\label{sec:exp scheme}

A total of 21 subjects were recruited and  partitioned into two groups: the \emph{Automated Group} (7 male and 4 female) and the \emph{Wizard Group} (5 male and 5 female). Subjects in both groups completed two task sessions. For the Automated Group, cues were generated using the proposed framework. In one session our framework had access to both gaze and task-related movements (arrow manipulation commands), whereas in the other gaze was not observable. The same parameters were used for all subjects across all sessions. For the Wizard Group, a human wizard, who had access to the same set of cues available to our framework, chose the cues. The wizard observed the task progress through a monitor, which also showed both the agent's arrow commands and eye gaze in one session and only the arrow commands in the other session. See \apdxref{apdx:wizard setup} for more details. We followed the same procedures for both groups and regardless of whether gaze is used. Subjects in the Wizard Group were unaware of the wizard. We spaced the two sessions by several days to control skill build-up and randomized the order of with-gaze and without-gaze sessions. \apdxref{apx: exp protocol} gives more details on the experimental protocol, which was approved by the Human and Artefacts Research Ethics Committee at the Hong Kong University of Science and Technology as HREP-2021-0193. 

\subsection{Evaluation Criteria}\label{sec:eval criteria summary}
To measure the overall usability of Sophia as an helper, the subject is asked to complete the questionnaire of System Usability Scale \citep{Brooke1996SUS:Scale} (\apdxref{apx: exp sus questionnaire}) after completing the posing task. We used interviews to evaluate the timeliness and precision of the guidance provided. After completing a posing task, the subject would watch a video playback of the session. At the end of each cue provided by Sophia, the subject answered three multiple-choice questions (\apdxref{apx: exp question playback}). The first question asked whether there \emph{should} be a cue. If yes, the second and third questions asked about the timeliness and precision (level) of the cue. When answering the question regarding timeliness, the subject was instructed to ignore the actual content of the cue, and focus on whether a cue was needed at the time it was issued. 