\section{Further Details Regarding the Implementation of the Proposed Framework in the Posing Task}\label{apx:posing task impl detail}
\subsection{Planner}\label{apx: impl planner}
\paragraph{Task and Motion Planners} As illustrated in Figure \figureref{fig:posing task htn representation}, the HTN representing the posing task consists of both symbolic tasks (sub-goals and pose mimicry tasks) that has no direct kinematic meanings as well as motion tasks (joint mimicry tasks and joint actuation tasks) that could be mapped to re-configurations of the puppet. Therefore the planner needs both symbolic task planning and motion planning. For symbolic planning, as stated in \secref{sec:generic planner}, we use the HATP planner proposed in \citep{Lallement2014HATP:Robotics} and released under 2-clause BSD license to solve the symbolic part of the posing task given its progress so far, where the optimality is measured by the total amount of rotation and translation in the puppet necessitated by a plan. The optimal branch of the partial plan found by HATP would then be sent to the motion planner since it still lacks joint mimicry and joint actuation tasks the person is expected to execute. By default, at joint mimicry level the planner would choose the most-different joint on the puppet with respect to the target pose as the next step. At joint actuation level, if the joint of interest is prismatic the plan could be trivially found. On the other hand, when planning for a revolute joint the planner decides which arrow the subject would use first by exhausting all possible conventions of Euler angle decomposition with respect to the rotation transform needed at the joint and choose the first step of the Euler angle sequence with minimal total rotation. With tasks at these two levels appended, the framework now has the complete optimal branch consisting of expected tasks the subject would be performing at all levels of the hierarchy. 
\paragraph{Re-plan} The re-plan behaviour of the planner when the world state gets updated is also two-fold. For the symbolic planner, if the subject completes a pose mimicry task like moving the puppet to a location -- expected or unexpected -- the planner re-plans by first invoking HATP and then the motion planner. In contrast, should the subject finish mimicking a joint or choose to first adjust a joint other than the expected one, the planner re-plans by sending the same symbolic plan to the motion planner again. Similarly, if the subject veers off the expected joint actuation sequence the planner re-computes a new sequence given the expected joint mimicry task and the current joint configuration.
\paragraph{Think} To model the think process of subjects upon task completion, a think-phase is added whenever the subject completes a task, during which the intention estimation is skipped.

\subsection{Intention Estimation}\label{apx:impl state estimate}
\paragraph{Discretized Features}
In the posing task we set 
\begin{align}
    \label{eq:posing task def-config-feature}
    c^t = \begin{cases}
      -1& \text{if no arrow manipulation is observed}\\
      0 & \text{if the arrow manipulation is inconsistent w.r.t. joint actuation task} A_N\\
      1 & \text{if the arrow manipulation is consistent w.r.t. joint actuation task} A_N
    \end{cases}
\end{align}
Similarly, for each task $A_i$,
\begin{align}
    \label{eq:posing task def--gaze-feature}
    g^t = \begin{cases}
      -1 & \text{if no gaze is observed}\\
      0 & \text{if looking at irrelevant entities w.r.t. task} A_i\\
      1 & \text{if looking at relevant entities on the puppet w.r.t } A_i\\
      2 & \text{if looking at relevant entities on the reference object w.r.t. task} A_i
    \end{cases}
\end{align}
where as shown in \figureref{fig:entity specification} ``relevant entities'' could mean the puppet / reference object as a whole, or a joint / arrow on the puppet or the reference object, depending on the task $A_i$ stands for. To determine whether the gaze of the subject falls on one of those entities we use both ray-cast method \citep{Wang2017UsingEstimation} and angular distance threshold.

\paragraph{Markov Chain Parameter}
As shown in \figureref{fig: posing task algo branch to var} both $P( \{g^t\}_{t=1}^{T} | x_i)$ and $P( \{c^t\}_{t=1}^{T} | x_i)$ are modeled by finite-state Markov Chains. We follow empirical rules to select parameters of these Markov Chains. For modeling arrow manipulation with $P( \{c^t\}_{t=1}^{T} | x_i)$, the subject is more likely to manipulate arrows correctly if executing the right (joint actuation) task, i.e., $P( c^t=1 | x_i = 1 , c^{t-1}=\cdot)> P( c^t=1 | x_i = 0 , c^{t-1}=\cdot)$. Further, there should at least be some arrow manipulation from the subject when executing the right tasks, hence $P( c^t=-1 | x_i = 1 , c^{t-1}=-1) < P( c^t=-1 | x_i = 0 , c^{t-1}=-1)$ which is effectively a timeout if no arrow manipulation has been observed for a long time. For modeling gaze with $P( \{g^t\}_{t=1}^{T} | x_i)$, it is unlikely for the subject to keep fixating either entities on the puppet, entities on the reference object or irrelevant entities for a prolonged period should he / she is executing the right task, i.e., $P( g^t=g^{t-1} | x_i = 1 , g^{t-1}) < P( g^t=g^{t-1} | x_i = 0 , g^{t-1})$. Instead, when executing the right task the subject's gaze is expected to alternate between entities on puppet and entities on the reference object, i.e., $P( g^t = 1 | x_i = 1 , g^{t-1} = 2) > P( g^t = 1 | x_i = 0 , g^{t-1} = 2)$ and $P( g^t = 2 | x_i = 1 , g^{t-1} = 1) > P( g^t = 2 | x_i = 0 , g^{t-1} = 1)$.


\begin{figure}
     \centering
     \subfigure[Related entity specification for a pose mimicry task. If the gaze falls on any other entities in the scene the feature value would be set to $g^t=0$, i.e., irrelevant to the task. If no gaze is captured, $g^t=-1$.]{\label{fig:action level specification}%
        \includegraphics[width=.8\textwidth]{Image/related-entity-action-level.JPG}
     }
    %  \begin{subfigure}{\textwidth}
    %      \centering
    %      \includegraphics[width=\textwidth]{Image/related-entity-action-level.JPG}
    %      \captionsetup{width=\textwidth}
    %      \caption{Related entity specification for a pose mimicry task. If the gaze falls on any other entities in the scene the feature value would be set to $g^t=0$, i.e., irrelevant to the task. If no gaze is captured, $g^t=-1$.}
    %      \label{fig:action level specification}
    %  \end{subfigure}
      \subfigure[Related entity specification for a joint mimicry task. If the gaze falls on any other entities in the scene the feature value would be set to $g^t=0$, i.e., irrelevant to the task. If no gaze is captured, $g^t=-1$.]{\label{fig:joint level specification}%
        \includegraphics[width=.8\textwidth]{Image/related-entity-joint-level.JPG}
     }
    %  \begin{subfigure}{\textwidth}
    %      \captionsetup{width=\textwidth}
    %      \includegraphics[width=\textwidth]{Image/related-entity-joint-level.JPG}
    %      \caption{Related entity specification for a joint mimicry task. If the gaze falls on any other entities in the scene the feature value would be set to $g^t=0$, i.e., irrelevant to the task. If no gaze is captured, $g^t=-1$.}
    %      \label{fig:joint level specification}
    %  \end{subfigure}
      \subfigure[Related entity specification for an joint actuation task. If the gaze falls on any other entities in the scene \emph{when a group of arrow is visible} the feature value would be set to $g^t=0$, i.e., irrelevant to the task. If no gaze is captured or no arrow was shown,  $g^t=-1$. Note that as shown in \href{https://drive.google.com/file/d/1197bPN0diBbWJ2VpSRCT5UiwdGJpPE5f/view?usp=sharing}{this demo video} the interactive arrows would not appear until the wand has been moved close to the arrows. Also note that for the same direction of rotation of pair of arrows are available, both are treated the same when assigning gaze features.]{\label{fig:arrow level specification}%
        \includegraphics[width=.8\textwidth]{Image/related-entity-arrow-level.JPG}
     }    
    %  \begin{subfigure}{\textwidth}
    %      \centering
    %      \captionsetup{width=\textwidth}
    %      \includegraphics[width=\textwidth]{Image/related-entity-arrow-level.JPG}
    %      \caption{Related entity specification for an joint actuation task. If the gaze falls on any other entities in the scene \emph{when a group of arrow is visible} the feature value would be set to $g^t=0$, i.e., irrelevant to the task. If no gaze is captured or no arrow was shown,  $g^t=-1$. Note that as shown in \href{https://drive.google.com/file/d/1197bPN0diBbWJ2VpSRCT5UiwdGJpPE5f/view?usp=sharing}{this demo video} the interactive arrows would not appear until the wand has been moved close to the arrows. Also note that for the same direction of rotation of pair of arrows are available, both are treated the same when assigning gaze features.}
    %      \label{fig:arrow level specification}
    %  \end{subfigure}    
     \caption{Related entity specification for assigning gaze feature values, Note that the large pose object in front would not be available until Sophia have offered a joint-level or arrow-level cue, as shown in \figureref{fig:posing task cue capability}}
     \label{fig:entity specification}
\end{figure}


\subsection{Unity Assets}\label{apx:unity asset list}
Here we list assets used by the implementation of the framework in the posing task with hyper links pointing to their corresponding unity asset store page. All assets are under Standard Unity Asset Store EULA.
\begin{itemize}
    \item \href{https://assetstore.unity.com/packages/tools/gui/goodrect-popup-149497}{Goodrect Popup}
    \item \href{https://assetstore.unity.com/packages/tools/camera/ultimate-replay-2-0-178602}{Ultimate Replay 2.0}
    \item \href{https://assetstore.unity.com/packages/tools/animation/animancer-lite-116516}{Animancer Lite}
    \item \href{https://assetstore.unity.com/packages/3d/props/weapons/3d-items-free-wand-pack-46225}{3D Items - Free Wand Pack}
    \item \href{https://assetstore.unity.com/packages/3d/characters/humanoids/hyper-casual-character-stickman-sphere-head-161922}{Hyper-Casual Character Stickman sphere head}
    \item \href{https://assetstore.unity.com/packages/3d/primitives-3197}{Primitives}
    \item \href{https://assetstore.unity.com/packages/3d/props/lowpoly-arrows-pack-191184}{LowPoly Arrows Pack}
\end{itemize}