\section{Further Details of Experiment}\label{apx:exp full detail}

\subsection{Subject and Human Wizard Recruitment}\label{apx: exp recruit}
The subjects and the human wizard are all college students but are from different fields, covering engineering, natural science, business and social science. They were informed of the content, estimated duration and payment of the experiment before they give consent and register for the experiment. 


\subsection{Curated Data and Privacy}\label{apx: exp privacy}
The curated data set contains no personally identifiable information. The replays of the experiment sessions are in VR view only without any photo or video footage involving the subjects. Both the replays and the answers to the SUS questionnaire and the multiple-choice prompts are indexed by subject number alone. The wizard was informed that there would be a follow up interview regarding the opinion and experience formed through Wizard Group sessions and we acquired the wizard's consent to present some of the remarks in an anonymous way in this paper.


\subsection{Experiment Protocol}\label{apx: exp protocol}
The same experiment protocol adopted by the two sessions of each of the subjects of the two groups are described here.
\begin{enumerate}
    \item \textbf{Tutorial} A tutorial session would be given to all subjects before the commencement of their first task session, during which the subject is directed to complete a posing task which has the same representation as shown in \figureref{fig:posing task htn representation} but consists of simple, mock poses. The purpose of this tutorial session is to familiarize the subject with the task procedures, the arrow manipulation, the role of Sophia and the forms of cues regarding each type of tasks. The mock poses are shown in \figureref{fig:ref-pose-tut}. It would be emphasized at the end of the tutorial session that the subject is supposed to actively explore and progress the task, while Sophia would observe and attempt to provide guidance as seen fit.
    \item \textbf{Gaze Calibration} The on-board Tobii eye-tracker of the HTC-Vive VR headset is used for eye-tracking. A calibration is made after the tutorial session \emph{regardless} of whether the current session would use gaze information. The purpose of eye-gaze is \emph{not} disclosed to the subject.
    \item \textbf{Briefing} Sophia would shuffle and then point out the location-pose pairing for the up-coming posing task session to the subject, before which it would be emphasized that the ordering of achieving the sub-goals could vary but the pairing must be correctly followed, as per the partially ordered task presentation in \figureref{fig:posing task htn representation}. Note that the same set of reference poses and locations, as illustrated in \figureref{fig:ref-pose-exp}, is used for all sessions of all groups. The briefing phase is purposefully designed to be quick and vague, thereby creating the need of further guidance at different abstract levels later during the execution of the posing task. \href{https://drive.google.com/file/d/1RUr0MqpW8GsFdetVcOF2aIp2y8to3I4J/view?usp=sharing}{This video is a demo of the briefing phase}.
    \item \textbf{Task Execution} After the briefing the subject moves on to actually carry out the posing task with guidance from Sophia, controlled either by our framework or by the human wizard, with or without gaze. The entire task execution is recorded for replay later. \href{https://drive.google.com/file/d/1197bPN0diBbWJ2VpSRCT5UiwdGJpPE5f/view?usp=sharing}{Here is a demo of task execution}.
    \item \textbf{Evaluation} The subject was asked to make an assessment of the guidance Sophia provided after finishing the posing task. First the subject would be asked to watch the replay of the task execution process he / she just went through, during which process the subjects are required to try to recall his / her state and the progress of the task, based on which the subject would answer a set of questions regarding the timeliness and precision of guidance at the end of each cue. This assessment process is illustrated in \href{https://drive.google.com/file/d/13Ms15CEmkuuwrm1oPHHDGx2RMMqpUc-5/view?usp=sharing}{this demo video} and details of the questions that the subject would be prompted with are given in \apdxref{apx: exp question playback}. After watching the replay the subject would answer the SUS questionnaire detailed in \apdxref{apx: exp sus questionnaire}.
    \item \textbf{Reward} The subject is rewarded 50 HKD per hour at the end of each experiment session. In total we spent $2650$ HKD on subject compensation.
\end{enumerate}

\begin{figure}
    \centering
    \subfigure[Mock reference poses used during tutorial sessions.]{\label{fig:ref-pose-tut}%
        \includegraphics[width=.5\textwidth]{Image/MockSetup.JPG}
    }%
    %  \begin{subfigure}{0.5\textwidth}
    %      \centering
    %      \includegraphics[width=\textwidth]{Image/MockSetup.JPG}
    %      \captionsetup{width=.8\textwidth}
    %      \caption{}
    %      \label{fig:ref-pose-tut}
    %  \end{subfigure}%
    \subfigure[Reference poses used during experiment sessions.]{\label{fig:ref-pose-exp}%
        \includegraphics[width=.5\textwidth]{Image/TaskSetup.JPG}
    }
    %  \begin{subfigure}{0.5\textwidth}
    %      \centering
    %      \includegraphics[width=\textwidth]{Image/TaskSetup.JPG}
    %      \captionsetup{width=.8\textwidth}
    %      \caption{}
    %      \label{fig:ref-pose-exp}
    %  \end{subfigure}
     \caption{Reference pose setup for tutorial sessions and experiment sessions.}
     \label{fig:ref-pose-illu}
\end{figure}

\subsection{Playback Questions}\label{apx: exp question playback}
 When watching the playback the subject is instructed to recall his / her state as well as the task progress. The subject would be prompted with the following questions at the end of every cuing action in the playback. The first question attempts to establish whether there should be guidance at that moment.

\paragraph{\emph{Question 1}} \textbf{Should there be a cue?}\\
\underline{\emph{Question}}: In retrospect, should there have been a cue at that moment?\\
\underline{\emph{Instruction}}: Answer this question by reflecting on your state and the progress of the posing task. Choose one of the options below by considering whether your were consciously seeking help at this moment and whether there is a mistake / an issue hindering task progress.\\
\underline{\emph{Options}}: 
\begin{enumerate}
    \item Yes. I wanted help.
    \item Yes. I was making a mistake or having difficulty.
    \item Yes. Both of the above are true.
    \item No. Sophia should have remained silent.\label{question:shouldn't}
    \item None of the above. Please elaborate.
\end{enumerate}

If the subject selects \ref{question:shouldn't} in the first question, the evaluation for this cue terminates immediately. If the subject selects any other options, he / she would then be prompted with question 2 and 3 below.

\paragraph{\emph{Question 2}} \textbf{Timing of the Cue}\\
\underline{\emph{Question}}: Please evaluate the TIMING of the cue.\\
\underline{\emph{Instruction}}: Answer this question without considering the exact content of the cue. Instead choose from the options below by considering whether the timing of this cue is appropriate given the issue / your wish of getting help identified in Question 1. E.g., Maybe you have been looking for help for a long time, or maybe Sophia could wait till you finish what you have at hand before addressing an existing issue.\\
\underline{\emph{Options}}: 
\begin{enumerate}
    \item The cue should have come sooner.
    \item The cue came at an appropriate time.
    \item The cue should have come later.
    \item None of the above. Please elaborate.
\end{enumerate}

\paragraph{\emph{Question 3}} \textbf{Content of the Cue}\\
\underline{\emph{Question}}: Please evaluate the CONTENT of the cue.\\
\underline{\emph{Instruction}}: Answer this question by comparing the content of the cue with the issue present / the help sought after identified in Question 1 and choose from the options below. E.g., pointing out what joint to correct next might be too low-level when the issue is your forgetting the target pose, whereas pointing at the target pose is probably too generic to help you choose the correct arrow to use next.\\
\underline{\emph{Options}}: 
\begin{enumerate}
    \item The information given was too ABSTRACT or GENERIC.
    \item The information was given at the correct level of detail.
    \item The information given was too LOW LEVEL.
    \item The information given was IRRELEVANT.
    \item None of the above. Please elaborate.
\end{enumerate}

\subsection{System Usability Scale}\label{apx: exp sus questionnaire}
 System Usability Scale (SUS) has been widely used in usability study \citep{1996SUS:Scale}\citep{Lewis2018TheFuture}. In our experiment we use SUS for an overall evaluation of the performance of Sophia serving as a helper to provide guidance. The instructions we give and the questionnaire answered by the subjects are shown below, whose results are normalized following the procedures described in \citep{1996SUS:Scale}.
 
 \underline{\emph{Instruction}}: Please answer each item by marking a number to indicate how much you agree with each statement. Answer all items even if unsure of your answer. Note that ``this system'' stands for the Sophia robot serving as a helper in the task in the questions below.\\
\underline{\emph{Questions}}:
\begin{enumerate}
    \item I think that I would like to use this system frequently.
    \item I found the system unnecessarily complex.
    \item I thought the system was easy to use.
    \item I think that I would need the support of a technical person to be able to use this system.
    \item I found the various functions in this system were well integrated.
    \item I thought there was too much inconsistency in this system.
    \item I would imagine that most people would learn to use this system very quickly.
    \item I found the system very cumbersome to use.
    \item I felt very confident using the system.
    \item I needed to learn a lot of things before I could get going with this system.
\end{enumerate}




\begin{figure}
    \centering
    \subfigure[Overview of the workspace of the human wizard.]{\label{fig:wizard overview}%
        \includegraphics[width=\textwidth]{Image/wizard-setup-overview.JPG}
    }     
    %  \begin{subfigure}{\textwidth}
    %      \centering
    %      \includegraphics[width=\textwidth]{Image/wizard-setup-overview.JPG}
    %      \captionsetup{width=.8\textwidth}
    %      \caption{}
    %      \label{fig:wizard overview}
    %  \end{subfigure}
    \subfigure[Close view of the wizard's monitor screen where critical information of the posing task's execution is displayed.]{\label{fig:wizard monitor view}%
        \includegraphics[width=\textwidth]{Image/wizard monitor.JPG}
    }       
    %  \begin{subfigure}{\textwidth}
    %      \centering
    %      \includegraphics[width=\textwidth]{Image/wizard monitor.JPG}
    %      \captionsetup{width=.8\textwidth}
    %      \caption{}
    %      \label{fig:wizard monitor view}
    %  \end{subfigure}
     \caption{Illustration of the human wizard's work setting.}
     \label{fig:wizard-setup-image}
\end{figure}


\subsection{Wizard-of-Oz Setup}\label{apdx:wizard setup}
\figureref{fig:wizard overview} gives an overview of the Wizard-of-Oz setup. The human wizard sits in a location hidden from the subject, monitoring the progress of the posing task on the screen and invoking cuing actions through a keyboard. The location-pose pairing and the difference between the puppet and the target pose could be seen on the monitor, as shown in \figureref{fig:wizard monitor view}. The observations available to the wizard is restricted to be exactly the same as that when running our proposed framework. \href{https://drive.google.com/file/d/1I0wzWZxyJs4vTOpjlg6K7tO36xf1OlEG/view?usp=sharing}{The demo video here} illustrates the difference: for sessions without gaze the wizard could only see the arrow manipulation made by the subject, whereas for session with gaze the wizard sees both the gaze point and the arrow manipulation. In either case, no other behavioral information from the subject is available to the wizard.

To offer guidance to the subject, the wizard is trained to remember the correspondence between key-strokes and cuing actions -- a list is also available for reference during the experiment. The wizard could either choose the level of abstraction of the cue to offer and reuse our proposed algorithm's optimal branch to choose the exact task which Sophia would cue, or manually specify exactly which task should Sophia talk about. Although this is certainly not the most ideal interface to be used by a human wizard, later remarks from the wizard suggest that the interface was certainly not a major constraint.

During training the wizard tried the posing task repeatedly, taking the role of the subject as well as the role of the wizard. In the former case the wizard also completed the evaluation by watching the replay and answering the questions (\apdxref{apx: exp question playback},\apdxref{apx: exp sus questionnaire}). Through this process the wizard became familiarized with the evaluation criteria of the posing task.

The wizard is paid by a base amount of 50 HKD per hour for both training and experiment sessions. To further create incentives, the wizard was informed that the base reward would be scaled according to the quality of guidance assessed by subjects from the Wizard Group. Specifically, if the average ratio of timely and precise cues of the Wizard Group sessions with gaze is $x$ percent higher then that of the Automated Group, the wizard's reward for those sessions (again with gaze) would be multiplied by $1 + 0.1x$. Same reward scheme is adopted for sessions without gaze except that the multiplier is changed to $1 + 0.01x$. But the exact multipliers were not disclosed to the wizard until Wizard Group experiments have been completed. In total we paid the wizard by a amount of $2060$ HKD. Note that during wizard-group sessions the wizard was \emph{not} present when the subjects performed the evaluation. Instead the wizard was only informed about the subjects' opinions and granted access to the replays \emph{after} finishing \emph{all} wizard group session.





